Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
$ parquet cat train-00000-of-00001-15a05aeec7726f9d.parquet
Unknown error
shaded.parquet.org.apache.avro.SchemaParseException: Illegal character in: original-instruction
at shaded.parquet.org.apache.avro.Schema.validateName(Schema.java:1607)
at shaded.parquet.org.apache.avro.Schema.access$400(Schema.java:92)
at shaded.parquet.org.apache.avro.Schema$Field.<init>(Schema.java:556)
at shaded.parquet.org.apache.avro.Schema$Field.<init>(Schema.java:595)
at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:295)
at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:279)
at org.apache.parquet.cli.util.Schemas.fromParquet(Schemas.java:89)
at org.apache.parquet.cli.BaseCommand.getAvroSchema(BaseCommand.java:405)
at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:66)
at org.apache.parquet.cli.Main.run(Main.java:163)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.parquet.cli.Main.main(Main.java:193)
the data set in question is: https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-en/tree/main/data