Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11444

Wrong results in reading wide rows from ORC

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • Impala 3.4.0, Impala 3.4.1
    • Impala 3.4.2
    • Backend
    • ghx-label-7

    Description

      The bug only exists in 3.4 branches where we have IMPALA-9228 and is missing IMPALA-9469.

      When reading from a wide table with tuple size larger than 4096 bytes (4KB), the orc scanner produces wrong results. The issue can be reproduced using the attached CreateTable stmt and the ORC file.

      $ bin/impala-shell.sh --quiet -f create-table-512cols.sql
      $ bin/impala-shell.sh -B --quiet -q 'show table stats orc_tbl_512cols'
      -1	0	0B	NOT CACHED	NOT CACHED	ORC	false	hdfs://localhost:20500/test-warehouse/orc_tbl_512cols
      $ hdfs dfs -put widerow_512cols.orc hdfs://localhost:20500/test-warehouse/orc_tbl_512cols
      $ bin/impala-shell.sh -q 'refresh orc_tbl_512cols'
      

      Then run the following query:

      $ bin/impala-shell.sh -B -q "select * from orc_tbl_512cols where col0 = '1'"
      

      The result should be only one row with all values as '1'. However, we get one rwo with all values as '1024'.

      Attachments

        1. widerow_512cols.orc
          26 kB
          Quanlong Huang
        2. create-table-512cols.sql
          7 kB
          Quanlong Huang

        Issue Links

          Activity

            People

              stigahuang Quanlong Huang
              stigahuang Quanlong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: