Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6670

Error in parquet record reader - previously readable file fails to be read in 1.14

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Parquet file which was generated by PyArrow was readable in Apache Drill 1.12 and 1.13, but fails to be read with 1.14.

      Running the query "SELECT * FROM dfs.`foo.parquet`" results in the following error message from the Drill web query UI:

      Query Failed: An Error Occurred
      
      org.apache.drill.common.exceptions.UserRemoteException: INTERNAL_ERROR ERROR: Error in parquet record reader. Message: Failure in setting up reader Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema { optional binary name (UTF8); optional binary creation_parameters (UTF8); optional int64 creation_date (TIMESTAMP_MICROS); optional int32 data_version; optional int32 schema_version; } , metadata: {pandas={"index_columns": [], "column_indexes": [], "columns": [{"name": "name", "field_name": "name", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "creation_parameters", "field_name": "creation_parameters", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "creation_date", "field_name": "creation_date", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, {"name": "data_version", "field_name": "data_version", "pandas_type": "int32", "numpy_type": "int32", "metadata": null}, {"name": "schema_version", "field_name": "schema_version", "pandas_type": "int32", "numpy_type": "int32", "metadata": null}], "pandas_version": "0.22.0"}}}, blocks: [BlockMetaData{1, 27142 [ColumnMetaData{SNAPPY [name] optional binary name (UTF8) [PLAIN, RLE], 4}, ColumnMetaData{SNAPPY [creation_parameters] optional binary creation_parameters (UTF8) [PLAIN, RLE], 252}, ColumnMetaData{SNAPPY [creation_date] optional int64 creation_date (TIMESTAMP_MICROS) [PLAIN, RLE], 46334}, ColumnMetaData{SNAPPY [data_version] optional int32 data_version [PLAIN, RLE], 46478}, ColumnMetaData{SNAPPY [schema_version] optional int32 schema_version [PLAIN, RLE], 46593}]}]} Fragment 0:0 [Error Id: bdb2e4d5-5982-4cc6-b95e-244782f827d2 on f9d0456cddd2:31010] 
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            okalinin Oleksandr Kalinin
            suicas Dave Challis
            Arina Ielchiieva Arina Ielchiieva
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment