Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4996

Parquet Date auto-correction is not working in auto-partitioned parquet files generated by drill-1.6

    XMLWordPrintableJSON

Details

    Description

      git.commit.id.abbrev=4ee1d4c

      Below are the steps I followed to generate the data :

      1. Generate a parquet file with date column using hive1.2
      2. Use drill 1.6 to create auto-partitioned parquet files partitioned on the date column
      

      Now the below query returns wrong results :

      select i_rec_start_date, i_size from dfs.`/drill/testdata/parquet_date/auto_partition/item_multipart_autorefresh`  group by i_rec_start_date, i_size;
      +-------------------+--------------+
      | i_rec_start_date  |    i_size    |
      +-------------------+--------------+
      | null              | large        |
      | 366-11-08        | extra large  |
      | 366-11-08        | medium       |
      | null              | medium       |
      | 366-11-08        | petite       |
      | 364-11-07        | medium       |
      | null              | petite       |
      | 365-11-07        | medium       |
      | 368-11-07        | economy      |
      | 365-11-07        | large        |
      | 365-11-07        | small        |
      | 366-11-08        | small        |
      | 365-11-07        | extra large  |
      | 364-11-07        | N/A          |
      | 366-11-08        | economy      |
      | 366-11-08        | large        |
      | 364-11-07        | small        |
      | null              | small        |
      | 364-11-07        | large        |
      | 364-11-07        | extra large  |
      | 368-11-07        | N/A          |
      | 368-11-07        | extra large  |
      | 368-11-07        | large        |
      | 365-11-07        | petite       |
      | null              | N/A          |
      | 365-11-07        | economy      |
      | 364-11-07        | economy      |
      | 364-11-07        | petite       |
      | 365-11-07        | N/A          |
      | 368-11-07        | medium       |
      | null              | extra large  |
      | 368-11-07        | small        |
      | 368-11-07        | petite       |
      | 366-11-08        | N/A          |
      +-------------------+--------------+
      34 rows selected (0.691 seconds)
      

      However I tried generating the auto-partitioned parquet files using Drill 1.2 and then the above query returned the right results.

      I attached the required data sets.

      Attachments

        1. item.tgz
          861 kB
          Rahul Kumar Challapalli

        Issue Links

          Activity

            People

              vitalii Vitalii Diravka
              rkins Rahul Kumar Challapalli
              Rahul Kumar Challapalli Rahul Kumar Challapalli
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: