Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5628 Parquet support for additional valid decimal representations
  3. IMPALA-2515

Impala rejects Parquet schemas where decimal fixed_len_byte_array columns have unnecessary padding bytes

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • Impala 2.3.0
    • Impala 4.0.0
    • Backend

    Description

      Impala cannot read this:

      {"name": "tmp_1",
       "type": "fixed",
       "size": 8,
       "logicalType": "decimal",
       "precision": 10,
       "scale": 5}
      

      However, this can be read:

      {"name": "tmp_1",
       "type": "fixed",
       "size": 5,
       "logicalType": "decimal",
       "precision": 10,
       "scale": 5}
      

      Size must be precisely set to this, or Impala is unable to read the decimal column:

      size = int(math.ceil((math.log(2, 10) + precision) / math.log(256, 10)))
      

      There is nothing in the Parquet spec that says that Decimal columns must be sized precisely. Arguably it's a bug in the writer if it's doing it, because it's just wasting space.
      https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal

      Attachments

        1. image-2020-02-07-11-48-35-179.png
          35 kB
          Onur Tokat
        2. image-2020-02-07-11-36-31-458.png
          21 kB
          Onur Tokat
        3. image-2020-02-07-11-34-04-220.png
          33 kB
          Onur Tokat
        4. image-2020-02-07-11-33-27-944.png
          21 kB
          Onur Tokat
        5. image-2020-02-07-11-31-43-641.png
          42 kB
          Onur Tokat
        6. image-2020-02-07-11-31-38-074.png
          42 kB
          Onur Tokat

        Issue Links

          Activity

            People

              tarmstrong Tim Armstrong
              tarasbob Taras Bobrovytsky
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: