Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
SystemDS 2.1, SystemDS 3.1
-
None
-
None
Description
Number of runs is estimated very conservatively making the estimation of size in data that have very long runs, or is extremely sparse.
Sample Based Estimate:
21/04/21 12:50:53 ERROR estim.CompressedSizeInfoColGroup: Best Type: RLE facts:
rows:1000000 cols:[0] num Offsets:109891 LargestOffset:899070 num Singles:0 num Runs:100930 num Unique Vals:10 Cardinality: 1.0E-5 Sizes:
Full estimate of entire matrix:
21/04/21 12:50:53 ERROR estim.CompressedSizeInfoColGroup: Best Type: RLE facts:
rows:1000000 cols:[0] num Offsets:100003 LargestOffset:899997 num Singles:0 num Runs:67 num Unique Vals:10 Cardinality: 1.0E-5 Sizes:
There are already tests on master that is currently disabled for verifying if the behaviour is fixed (they are currently disabled.)
src/test/java/org/apache/sysds/test/component/compress/colgroup/JolEstimateRLETest.java currently overwrite the test that tests this part of the code.