Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-517

Incorrect statistics written for decimal values

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.5.0
    • 1.5.6, 1.6.0
    • Java
    • None

    Description

      I came across with the following problem with min-max statistics while writing test cases for ORC with Spark (latest master). I created an table stored as ORC with a single decimal field, added a couple of negative number to this table, and used ORC tools to print the details of the ORC file created. I noticed that despite the minimum value was correct, the maximum was 0 (instead of the largest negative number added). To better understand the problem, here is a unit test to demonstrate it:

        @Test
        public void testDecimalMinMaxStatistics() throws Exception {
          TypeDescription schema = TypeDescription.createDecimal()
            .withScale(2).withPrecision(7);
      
          Writer writer = OrcFile.createWriter(testFilePath,
            OrcFile.writerOptions(conf).setSchema(schema).stripeSize(100000)
              .bufferSize(10000));
          VectorizedRowBatch batch = new VectorizedRowBatch(1, 1024);
      
          DecimalColumnVector decimalColumnVector = new DecimalColumnVector(7, 2);
          batch.cols[0] = decimalColumnVector;
          batch.reset();
          batch.size = 2;
      
          decimalColumnVector.set(0, new HiveDecimalWritable("-99999.99"));
          decimalColumnVector.set(1, new HiveDecimalWritable("-88888.88"));
          writer.addRowBatch(batch);
          writer.close();
      
          Reader reader = OrcFile.createReader(testFilePath,
            OrcFile.readerOptions(conf).filesystem(fs));
          DecimalColumnStatistics statistics = (DecimalColumnStatistics) reader.getStatistics()[0];
          assertEquals("Incorrect maximum value", new BigDecimal("-99999.99"), statistics.getMinimum().bigDecimalValue());
          assertEquals("Incorrect minimum value", new BigDecimal("-88888.88"), statistics.getMaximum().bigDecimalValue());
        }
      

      Note, that this test fails only on 1.5 and master, and passes on 1.4 branch. Am I doing something wrong here? If this is indeed a bug, I don't think this causes correctness problems, but might be source of performance regression in case min-max stats are used with predicate pushdown.

      Attachments

        Issue Links

          Activity

            People

              omalley Owen O'Malley
              nkollar Nándor Kollár
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m