Details
-
Improvement
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
-
Sprint 5
Description
Currently, our models are heavily overfitting on the training dataset. However, further evaluation has shown that this is not the usual overfitting due to an over-expressive model – in this case we are employing heavy model freezing (as much as only unfreezing the final softmax classifier of a pretrained ResNet50). Therefore, my evaluation has led me to believe that this is likely due to batch effects in the data, and an examination of the original slide distribution in the sample images dataset has shown a severe imbalance. Note, this is the distribution over the slide from which an image originated, and is distinctly different from the class distribution, which is much more reasonably dispersed.
slide_num count 0 436 1 1 116 1 2 468 2 3 38 3 4 195 4 5 173 5 6 13 7 7 481 8 8 83 9 9 349 11 10 490 15 11 292 17 12 281 22 13 387 26 14 326 32 15 286 32 16 88 39 17 477 48 18 205 57 19 135 58 20 127 58 21 16 61 22 245 66 23 5 81 24 306 83 25 284 91 26 263 100 27 15 120 28 345 124 29 380 128 30 24 137 31 382 150 32 1 154 33 421 164 34 163 169 35 278 171 36 235 197 37 332 197 38 343 207 39 43 237 40 249 246 41 113 256 42 496 262 43 482 264 44 86 269 45 415 269 46 472 326 47 422 329 48 450 340 49 108 348 50 3 390 51 191 402 52 272 474 53 85 483 54 97 484 55 210 508 56 293 544 57 41 595 58 452 613 59 220 613 60 406 651 61 67 665 62 260 666 63 361 673 64 269 684 65 50 684 66 304 753 67 101 769 68 433 868 69 4 898 70 499 915 71 145 917 72 357 918 73 365 940 74 82 951 75 126 965 76 185 965 77 164 1077 78 221 1086 79 165 1111 80 316 1129 81 350 1132 82 89 1162 83 19 1169 84 74 1206 85 132 1248 86 47 1278 87 188 1297 88 459 1312 89 368 1337 90 335 1368 91 225 1373 92 234 1378 93 487 1385 94 247 1464 95 427 1476 96 65 1492 97 402 1500 98 315 1557 99 201 1604 100 344 1607 101 273 1616 102 146 1623 103 341 1636 104 425 1640 105 182 1681 106 403 1682 107 275 1690 108 457 1717 109 448 1724 110 277 1729 111 70 1740 112 141 1747 113 264 1777 114 122 1880 115 319 1915 116 449 1951 117 104 1988 118 377 1993 119 285 2008 120 107 2084 121 410 2141 122 11 2148 123 367 2153 124 416 2162 125 311 2183 126 338 2206 127 51 2233 128 153 2255 129 144 2285 130 497 2358 131 218 2364 132 330 2376 133 308 2392 134 213 2480 135 454 2512 136 103 2567 137 446 2569 138 40 2622 139 251 2629 140 149 2632 141 455 2633 142 430 2669 143 262 2715 144 76 2737 145 18 2748 146 178 2763 147 383 2864 148 54 2871 149 223 2908 150 207 2931 151 486 3043 152 391 3099 153 342 3104 154 390 3116 155 276 3136 156 75 3141 157 181 3171 158 142 3213 159 414 3255 160 137 3276 161 295 3285 162 358 3315 163 7 3322 164 323 3327 165 71 3334 166 243 3344 167 120 3359 168 48 3371 169 434 3387 170 206 3404 171 9 3460 172 476 3467 173 32 3472 174 491 3496 175 444 3502 176 279 3530 177 59 3546 178 174 3556 179 464 3595 180 392 3633 181 99 3677 182 72 3682 183 347 3779 184 28 3804 185 314 3807 186 322 3809 187 492 3823 188 258 3824 189 230 3831 190 354 3887 191 346 3951 192 445 3963 193 209 3969 194 8 3986 195 443 3988 196 290 3993 197 118 4025 198 152 4026 199 56 4078 200 170 4131 201 84 4146 202 413 4150 203 447 4171 204 417 4193 205 60 4210 206 92 4265 207 374 4281 208 94 4307 209 161 4360 210 320 4408 211 114 4451 212 219 4480 213 90 4518 214 233 4528 215 396 4596 216 157 4661 217 117 4696 218 337 4724 219 202 4819 220 34 4827 221 105 4840 222 155 4841 223 176 4895 224 166 4966 225 456 5031 226 254 5085 227 475 5184 228 42 5221 229 172 5330 230 299 5358 231 473 5364 232 131 5369 233 61 5382 234 379 5470 235 355 5488 236 372 5496 237 53 5503 238 17 5523 239 495 5529 240 190 5536 241 451 5583 242 177 5630 243 123 5649 244 231 5686 245 217 5692 246 33 5742 247 55 5767 248 388 5786 249 318 5819 250 81 5838 251 62 5846 252 255 5854 253 485 5890 254 375 5928 255 156 5938 256 224 5945 257 267 5970 258 412 5987 259 136 6038 260 160 6055 261 240 6084 262 39 6093 263 469 6100 264 300 6167 265 183 6178 266 250 6195 267 49 6231 268 471 6251 269 334 6283 270 265 6422 271 407 6468 272 252 6472 273 466 6478 274 227 6528 275 102 6550 276 458 6653 277 140 6667 278 133 6668 279 493 6716 280 465 6729 281 370 6751 282 244 6772 283 216 6772 284 488 6773 285 95 6777 286 52 6788 287 57 6821 288 289 6846 289 362 6939 290 180 6944 291 324 6961 292 211 7012 293 73 7034 294 301 7094 295 23 7106 296 64 7169 297 420 7182 298 36 7219 299 376 7257 300 484 7265 301 253 7275 302 470 7312 303 460 7405 304 98 7425 305 302 7427 306 393 7435 307 159 7554 308 237 7564 309 274 7701 310 359 7769 311 68 7779 312 483 7829 313 151 7910 314 186 7948 315 442 7952 316 259 8049 317 246 8128 318 96 8129 319 271 8176 320 438 8190 321 87 8197 322 162 8226 323 489 8260 324 418 8312 325 31 8504 326 179 8532 327 79 8578 328 226 8600 329 27 8719 330 479 8862 331 268 8883 332 404 8908 333 46 8913 334 437 8961 335 147 9047 336 189 9164 337 20 9242 338 386 9356 339 435 9376 340 432 9495 341 408 9505 342 248 9509 343 462 9619 344 229 9774 345 193 9835 346 167 9871 347 69 9894 348 130 9954 349 327 10072 350 369 10078 351 106 10180 352 194 10212 353 325 10306 354 312 10344 355 303 10502 356 184 10655 357 463 10916 358 426 11055 359 283 11334 360 328 11450 361 129 11467 362 288 11806 363 124 12010 364 171 12250 365 121 12257 366 22 12276 367 423 12310 368 192 12313 369 378 12358 370 307 12366 371 143 12678 372 80 12899 373 66 12920 374 208 12970 375 158 13131 376 148 13423 377 119 13723 378 317 13830 379 395 13834 380 187 14003 381 25 14856 382 399 14905 383 478 16145 384 93 20009 385 215 20723
This task aims to improve the sampling policy to yield a more even slide distribution in the final image dataset, hopefully reducing the batch effects, and leading to improved model metric performance.