Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-1814

Improve slide distribution of the image dataset via improved sampling policy

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Currently, our models are heavily overfitting on the training dataset. However, further evaluation has shown that this is not the usual overfitting due to an over-expressive model – in this case we are employing heavy model freezing (as much as only unfreezing the final softmax classifier of a pretrained ResNet50). Therefore, my evaluation has led me to believe that this is likely due to batch effects in the data, and an examination of the original slide distribution in the sample images dataset has shown a severe imbalance. Note, this is the distribution over the slide from which an image originated, and is distinctly different from the class distribution, which is much more reasonably dispersed.

           slide_num  count
      0          436      1
      1          116      1
      2          468      2
      3           38      3
      4          195      4
      5          173      5
      6           13      7
      7          481      8
      8           83      9
      9          349     11
      10         490     15
      11         292     17
      12         281     22
      13         387     26
      14         326     32
      15         286     32
      16          88     39
      17         477     48
      18         205     57
      19         135     58
      20         127     58
      21          16     61
      22         245     66
      23           5     81
      24         306     83
      25         284     91
      26         263    100
      27          15    120
      28         345    124
      29         380    128
      30          24    137
      31         382    150
      32           1    154
      33         421    164
      34         163    169
      35         278    171
      36         235    197
      37         332    197
      38         343    207
      39          43    237
      40         249    246
      41         113    256
      42         496    262
      43         482    264
      44          86    269
      45         415    269
      46         472    326
      47         422    329
      48         450    340
      49         108    348
      50           3    390
      51         191    402
      52         272    474
      53          85    483
      54          97    484
      55         210    508
      56         293    544
      57          41    595
      58         452    613
      59         220    613
      60         406    651
      61          67    665
      62         260    666
      63         361    673
      64         269    684
      65          50    684
      66         304    753
      67         101    769
      68         433    868
      69           4    898
      70         499    915
      71         145    917
      72         357    918
      73         365    940
      74          82    951
      75         126    965
      76         185    965
      77         164   1077
      78         221   1086
      79         165   1111
      80         316   1129
      81         350   1132
      82          89   1162
      83          19   1169
      84          74   1206
      85         132   1248
      86          47   1278
      87         188   1297
      88         459   1312
      89         368   1337
      90         335   1368
      91         225   1373
      92         234   1378
      93         487   1385
      94         247   1464
      95         427   1476
      96          65   1492
      97         402   1500
      98         315   1557
      99         201   1604
      100        344   1607
      101        273   1616
      102        146   1623
      103        341   1636
      104        425   1640
      105        182   1681
      106        403   1682
      107        275   1690
      108        457   1717
      109        448   1724
      110        277   1729
      111         70   1740
      112        141   1747
      113        264   1777
      114        122   1880
      115        319   1915
      116        449   1951
      117        104   1988
      118        377   1993
      119        285   2008
      120        107   2084
      121        410   2141
      122         11   2148
      123        367   2153
      124        416   2162
      125        311   2183
      126        338   2206
      127         51   2233
      128        153   2255
      129        144   2285
      130        497   2358
      131        218   2364
      132        330   2376
      133        308   2392
      134        213   2480
      135        454   2512
      136        103   2567
      137        446   2569
      138         40   2622
      139        251   2629
      140        149   2632
      141        455   2633
      142        430   2669
      143        262   2715
      144         76   2737
      145         18   2748
      146        178   2763
      147        383   2864
      148         54   2871
      149        223   2908
      150        207   2931
      151        486   3043
      152        391   3099
      153        342   3104
      154        390   3116
      155        276   3136
      156         75   3141
      157        181   3171
      158        142   3213
      159        414   3255
      160        137   3276
      161        295   3285
      162        358   3315
      163          7   3322
      164        323   3327
      165         71   3334
      166        243   3344
      167        120   3359
      168         48   3371
      169        434   3387
      170        206   3404
      171          9   3460
      172        476   3467
      173         32   3472
      174        491   3496
      175        444   3502
      176        279   3530
      177         59   3546
      178        174   3556
      179        464   3595
      180        392   3633
      181         99   3677
      182         72   3682
      183        347   3779
      184         28   3804
      185        314   3807
      186        322   3809
      187        492   3823
      188        258   3824
      189        230   3831
      190        354   3887
      191        346   3951
      192        445   3963
      193        209   3969
      194          8   3986
      195        443   3988
      196        290   3993
      197        118   4025
      198        152   4026
      199         56   4078
      200        170   4131
      201         84   4146
      202        413   4150
      203        447   4171
      204        417   4193
      205         60   4210
      206         92   4265
      207        374   4281
      208         94   4307
      209        161   4360
      210        320   4408
      211        114   4451
      212        219   4480
      213         90   4518
      214        233   4528
      215        396   4596
      216        157   4661
      217        117   4696
      218        337   4724
      219        202   4819
      220         34   4827
      221        105   4840
      222        155   4841
      223        176   4895
      224        166   4966
      225        456   5031
      226        254   5085
      227        475   5184
      228         42   5221
      229        172   5330
      230        299   5358
      231        473   5364
      232        131   5369
      233         61   5382
      234        379   5470
      235        355   5488
      236        372   5496
      237         53   5503
      238         17   5523
      239        495   5529
      240        190   5536
      241        451   5583
      242        177   5630
      243        123   5649
      244        231   5686
      245        217   5692
      246         33   5742
      247         55   5767
      248        388   5786
      249        318   5819
      250         81   5838
      251         62   5846
      252        255   5854
      253        485   5890
      254        375   5928
      255        156   5938
      256        224   5945
      257        267   5970
      258        412   5987
      259        136   6038
      260        160   6055
      261        240   6084
      262         39   6093
      263        469   6100
      264        300   6167
      265        183   6178
      266        250   6195
      267         49   6231
      268        471   6251
      269        334   6283
      270        265   6422
      271        407   6468
      272        252   6472
      273        466   6478
      274        227   6528
      275        102   6550
      276        458   6653
      277        140   6667
      278        133   6668
      279        493   6716
      280        465   6729
      281        370   6751
      282        244   6772
      283        216   6772
      284        488   6773
      285         95   6777
      286         52   6788
      287         57   6821
      288        289   6846
      289        362   6939
      290        180   6944
      291        324   6961
      292        211   7012
      293         73   7034
      294        301   7094
      295         23   7106
      296         64   7169
      297        420   7182
      298         36   7219
      299        376   7257
      300        484   7265
      301        253   7275
      302        470   7312
      303        460   7405
      304         98   7425
      305        302   7427
      306        393   7435
      307        159   7554
      308        237   7564
      309        274   7701
      310        359   7769
      311         68   7779
      312        483   7829
      313        151   7910
      314        186   7948
      315        442   7952
      316        259   8049
      317        246   8128
      318         96   8129
      319        271   8176
      320        438   8190
      321         87   8197
      322        162   8226
      323        489   8260
      324        418   8312
      325         31   8504
      326        179   8532
      327         79   8578
      328        226   8600
      329         27   8719
      330        479   8862
      331        268   8883
      332        404   8908
      333         46   8913
      334        437   8961
      335        147   9047
      336        189   9164
      337         20   9242
      338        386   9356
      339        435   9376
      340        432   9495
      341        408   9505
      342        248   9509
      343        462   9619
      344        229   9774
      345        193   9835
      346        167   9871
      347         69   9894
      348        130   9954
      349        327  10072
      350        369  10078
      351        106  10180
      352        194  10212
      353        325  10306
      354        312  10344
      355        303  10502
      356        184  10655
      357        463  10916
      358        426  11055
      359        283  11334
      360        328  11450
      361        129  11467
      362        288  11806
      363        124  12010
      364        171  12250
      365        121  12257
      366         22  12276
      367        423  12310
      368        192  12313
      369        378  12358
      370        307  12366
      371        143  12678
      372         80  12899
      373         66  12920
      374        208  12970
      375        158  13131
      376        148  13423
      377        119  13723
      378        317  13830
      379        395  13834
      380        187  14003
      381         25  14856
      382        399  14905
      383        478  16145
      384         93  20009
      385        215  20723
      

      This task aims to improve the sampling policy to yield a more even slide distribution in the final image dataset, hopefully reducing the batch effects, and leading to improved model metric performance.

      Attachments

        Activity

          People

            dusenberrymw Mike Dusenberry
            dusenberrymw Mike Dusenberry
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: