Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6670

Error in parquet record reader - previously readable file fails to be read in 1.14

    XMLWordPrintableJSON

Details

    Description

      Parquet file which was generated by PyArrow was readable in Apache Drill 1.12 and 1.13, but fails to be read with 1.14.

      Running the query "SELECT * FROM dfs.`foo.parquet`" results in the following error message from the Drill web query UI:

      Query Failed: An Error Occurred
      
      org.apache.drill.common.exceptions.UserRemoteException: INTERNAL_ERROR ERROR: Error in parquet record reader. Message: Failure in setting up reader Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema { optional binary name (UTF8); optional binary creation_parameters (UTF8); optional int64 creation_date (TIMESTAMP_MICROS); optional int32 data_version; optional int32 schema_version; } , metadata: {pandas={"index_columns": [], "column_indexes": [], "columns": [{"name": "name", "field_name": "name", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "creation_parameters", "field_name": "creation_parameters", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "creation_date", "field_name": "creation_date", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, {"name": "data_version", "field_name": "data_version", "pandas_type": "int32", "numpy_type": "int32", "metadata": null}, {"name": "schema_version", "field_name": "schema_version", "pandas_type": "int32", "numpy_type": "int32", "metadata": null}], "pandas_version": "0.22.0"}}}, blocks: [BlockMetaData{1, 27142 [ColumnMetaData{SNAPPY [name] optional binary name (UTF8) [PLAIN, RLE], 4}, ColumnMetaData{SNAPPY [creation_parameters] optional binary creation_parameters (UTF8) [PLAIN, RLE], 252}, ColumnMetaData{SNAPPY [creation_date] optional int64 creation_date (TIMESTAMP_MICROS) [PLAIN, RLE], 46334}, ColumnMetaData{SNAPPY [data_version] optional int32 data_version [PLAIN, RLE], 46478}, ColumnMetaData{SNAPPY [schema_version] optional int32 schema_version [PLAIN, RLE], 46593}]}]} Fragment 0:0 [Error Id: bdb2e4d5-5982-4cc6-b95e-244782f827d2 on f9d0456cddd2:31010] 
      

      Attachments

        1. example.parquet
          2 kB
          Dave Challis

        Issue Links

          Activity

            People

              okalinin Oleksandr Kalinin
              suicas Dave Challis
              Arina Ielchiieva Arina Ielchiieva
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: