Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-143

DELTA encoding may exaggerate number of bits required

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.4.0
    • 1.4.0
    • Java
    • None

    Description

      Consider the following code:

      RunLengthIntegerWriterV2.java, determineEncoding()
          this.min = literals[0];
          long max = literals[0];
          final long initialDelta = literals[1] - literals[0];
          long currDelta = initialDelta;
          long deltaMax = initialDelta;
          this.adjDeltas[0] = initialDelta;
      

      Given the following sequence of longs:

      {0, 10000, 10001, 10002, 10003, 10004, 10005}

      deltaMax would be 10000. deltaMax is used to determine the bit width of the encoded delta array, but the bit-packed output doesn't include the first delta--rather, it's encoded in Delta Base as a varint.

      I believe deltaMax should be set to 0 initially, allowing the later check for (i > 1) to ignore the first delta correctly.

      Sorry for no pull request with a regression test case. I'm not set up for java development here. It may also be that I'm reading this wrong.

      Attachments

        Issue Links

          Activity

            People

              ddrinka Douglas Drinka
              ddrinka Douglas Drinka
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: