Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2153

Cannot read schema from parquet file

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersStop watchingWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.12.3
    • None
    • parquet-avro
    • None

    Description

      I'm trying to generate a Avro schema from a parquet file. I get the issue both when using https://github.com/benwatson528/intellij-avro-parquet-plugin in Intellij as well as when I'm using my own implementation of generating a schema.

      The parquet file contains nested entries and arrays: {"a": a, "b": [

      {"c": c}

      ]}

      I get the following error message:

      org.apache.avro.SchemaParseException: Can't redefine: element
          at org.apache.avro.Schema$Names.put(Schema.java:1547)
          at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:810)
          at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:972)
          at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:1239)
          at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:1000)
          at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:984)
          at org.apache.avro.Schema$ArraySchema.toJson(Schema.java:1134)
          at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:1239)
          at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:1000)
          at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:984)
          at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:1239)
          at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:1000)
          at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:984)
          at org.apache.avro.Schema.toString(Schema.java:424)
          at org.apache.avro.Schema.toString(Schema.java:396)
          at uk.co.hadoopathome.intellij.viewer.fileformat.ParquetFileReader.getSchema(ParquetFileReader.java:65)
          at uk.co.hadoopathome.intellij.viewer.FileViewerToolWindow$2.doInBackground(FileViewerToolWindow.java:196)
          at uk.co.hadoopathome.intellij.viewer.FileViewerToolWindow$2.doInBackground(FileViewerToolWindow.java:184)
          at java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:304)
          at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
          at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:343)
          at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          at java.base/java.lang.Thread.run(Thread.java:829) 

      It appears to be an issue similar to PARQUET-1441 or PARQUET-1409.

      Or could it possibly be something wrong in my parquet file?

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            nils.scila Nils Broman

            Dates

              Created:
              Updated:

              Slack

                Issue deployment