Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6874

CTAS from json to parquet is not working on S3 storage

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.14.0
    • Fix Version/s: 1.15.0
    • Component/s: None
    • Labels:

      Description

      Json file "s3src.json" was uploaded to the s3 storage.
      Query from Json works fine:
      select * from s3.tmp.`s3src.json`;

      id first_name last_name
      1 first_name1 last_name1
      2 first_name2 last_name2
      3 first_name3 last_name3
      4 first_name4 last_name4
      5 first_name5 last_name5

      5 rows selected (2.803 seconds)

      CTAS from this json file returns successfully result:
      create table s3.tmp.`ctasjsontoparquet` as select * from s3.tmp.`s3src.json`;

      Fragment Number of records written
      0_0 5

      1 row selected (9.264 seconds)

      Query from the created parquet table throws an error:
      select * from s3.tmp.`ctasjsontoparquet`;

      Error: INTERNAL_ERROR ERROR: Error in parquet record reader.
      Message: Failure in setting up reader
      Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root {
        optional int64 id;
        optional binary first_name (UTF8);
        optional binary last_name (UTF8);
      }
      , metadata: {drill-writer.version=2, drill.version=1.15.0-SNAPSHOT}}, blocks: [BlockMetaData{5, 360 [ColumnMetaData{UNCOMPRESSED [id] optional int64 id  [BIT_PACKED, RLE, PLAIN], 4}, ColumnMetaData{UNCOMPRESSED [first_name] optional binary first_name (UTF8)  [BIT_PACKED, RLE, PLAIN], 111}, ColumnMetaData{UNCOMPRESSED [last_name] optional binary last_name (UTF8)  [BIT_PACKED, RLE, PLAIN], 241}]}]}
      
      Fragment 0:0
      
      Please, refer to logs for more information.
      
      [Error Id: 885723e4-8385-4fb0-87dd-c08b0570db95 on maprhost:31010] (state=,code=0)
      

      The same CTAS query works fine on MapRFS and FileSystem storages.

      Log files, json file and created parquet file from S3 are in the attachments.

        Attachments

        1. ctasjsontoparquet.zip
          0.7 kB
          Denys Ordynskiy
        2. drillbit_queries.json
          0.2 kB
          Denys Ordynskiy
        3. drillbit.log
          15 kB
          Denys Ordynskiy
        4. s3src.json
          0.3 kB
          Denys Ordynskiy
        5. sqlline.log
          15 kB
          Denys Ordynskiy

          Issue Links

            Activity

              People

              • Assignee:
                KazydubB Bohdan Kazydub
                Reporter:
                denysord88 Denys Ordynskiy
                Reviewer:
                Arina Ielchiieva
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: