Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3577

Counting nested fields on CTAS-created-parquet file/s reports inaccurate results

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Resolved
    • 1.1.0
    • 1.2.0
    • Functions - Drill
    • None

    Description

      I have not tried this at a smaller scale nor on JSON file directly but the following seems to re-prod the issue

      1. Create an input file as follows
      20K rows with the following -
      {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
      200 rows with the following -
      {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
      entries only"}}

      2. CTAS as follows

      CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
      

      This should read

      Fragment Number of records written
      0_0	20200
      

      3. Count on nested fields via

      select count(t.others.additional) from dfs.`tmp`.`tp` t
      OR
      select count(t.others.other) from dfs.`tmp`.`tp` t
      

      reports no rows as follows

      EXPR$0
      0
      

      While

      select count(t.`some`) from dfs.`tmp`.`tp` t where t.others.additional is not null
      

      reports expected 200 rows

      EXPR$0
      200
      

      Attachments

        Issue Links

          Activity

            People

              vitalii Vitalii Diravka
              hgunes Hanifi Gunes
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: