Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-2948

CLA Improved Run estimation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • SystemDS 2.1, SystemDS 3.1
    • SystemDS 2.1, SystemDS 3.1
    • None
    • None

    Description

      Number of runs is estimated very conservatively making the estimation of size in data that have very long runs, or is extremely sparse.

      Sample Based Estimate:

      21/04/21 12:50:53 ERROR estim.CompressedSizeInfoColGroup: Best Type: RLE facts:
      rows:1000000 cols:[0] num Offsets:109891 LargestOffset:899070 num Singles:0 num Runs:100930 num Unique Vals:10 Cardinality: 1.0E-5 Sizes:

      {UNCOMPRESSED=8000216, RLE=404016}

      Full estimate of entire matrix:
      21/04/21 12:50:53 ERROR estim.CompressedSizeInfoColGroup: Best Type: RLE facts:
      rows:1000000 cols:[0] num Offsets:100003 LargestOffset:899997 num Singles:0 num Runs:67 num Unique Vals:10 Cardinality: 1.0E-5 Sizes:

      {UNCOMPRESSED=8000216, RLE=560}

      There are already tests on master that is currently disabled for verifying if the behaviour is fixed (they are currently disabled.)

      src/test/java/org/apache/sysds/test/component/compress/colgroup/JolEstimateRLETest.java currently overwrite the test that tests this part of the code.

      Attachments

        Activity

          People

            Unassigned Unassigned
            baunsgaard Sebastian Baunsgaard
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: