Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3577

Counting nested fields on CTAS-created-parquet file/s reports inaccurate results

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Resolved
    • Affects Version/s: 1.1.0
    • Fix Version/s: 1.2.0
    • Component/s: Functions - Drill
    • Labels:
      None

      Description

      I have not tried this at a smaller scale nor on JSON file directly but the following seems to re-prod the issue

      1. Create an input file as follows
      20K rows with the following -
      {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
      200 rows with the following -
      {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
      entries only"}}

      2. CTAS as follows

      CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
      

      This should read

      Fragment Number of records written
      0_0	20200
      

      3. Count on nested fields via

      select count(t.others.additional) from dfs.`tmp`.`tp` t
      OR
      select count(t.others.other) from dfs.`tmp`.`tp` t
      

      reports no rows as follows

      EXPR$0
      0
      

      While

      select count(t.`some`) from dfs.`tmp`.`tp` t where t.others.additional is not null
      

      reports expected 200 rows

      EXPR$0
      200
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                vitalii Vitalii Diravka
                Reporter:
                hgunes Hanifi Gunes
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: