Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5833

Parquet reader fails with assertion error for Decimal9, Decimal18 types

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.10.0
    • 1.13.0
    • None
    • None

    Description

      The TestParquetWriter.testDecimal() test recently failed. As it turns out, this test never ran properly before against the "old" Parquet reader. Because the store.parquet.use_new_reader was left at a previous value, sometimes the test would run against the "new" reader (and succeed) or against the "old" reader (and fail.)

      Once the test was forced to run against the "old" reader, it fails deep in the Parquet record reader in DrillParquetGroupConverter.getConverterForType().

      The code attempts to create a Decimal9 writer by calling SingleMapWriter.decimal9(String name) to create the writer. However, the code in this method says:

        public Decimal9Writer decimal9(String name) {
          // returns existing writer
          final FieldWriter writer = fields.get(name.toLowerCase());
          assert writer != null;
          return writer;
        }
      

      And, indeed, the assertion is triggered.

      As it turns out, the code for Decimal28 shows the proper solution:

      mapWriter.decimal28Sparse(name, metadata.getScale(), metadata.getPrecision())
      

      That is, pass the scale and precision to this form of the method which actually creates the writer:

        public Decimal9Writer decimal9(String name, int scale, int precision) {
      

      Applying the same pattern to for the Parquet Decimal9 and Decimal18 types allows the above test to get past the asserts. Given this error, it is clear that this test could never have run, and so the error in the Parquet reader was never detected.

      It also turns out that the test itself is wrong, reversing the validation and test queries:

        public void runTestAndValidate(String selection, String validationSelection, String inputTable, String outputFile) throws Exception {
          try {
            deleteTableIfExists(outputFile);
            ...
            // Query reads from the input (JSON) table
            String query = String.format("SELECT %s FROM %s", selection, inputTable);
            String create = "CREATE TABLE " + outputFile + " AS " + query;
            // validate query reads from the output (Parquet) table
            String validateQuery = String.format("SELECT %s FROM " + outputFile, validationSelection);
            test(create);
      
            testBuilder()
                .unOrdered()
                .sqlQuery(query) // Query under test is input query
                .sqlBaselineQuery(validateQuery) // Baseline query is output query
                .go();
      

      Given this, it is the Parquet data that is wrong, not the baseline.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Paul.Rogers Paul Rogers
            Paul.Rogers Paul Rogers
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment