Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2378

Problem with a cat

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      $ parquet cat train-00000-of-00001-15a05aeec7726f9d.parquet                        

      Unknown error

      shaded.parquet.org.apache.avro.SchemaParseException: Illegal character in: original-instruction

      at shaded.parquet.org.apache.avro.Schema.validateName(Schema.java:1607)

      at shaded.parquet.org.apache.avro.Schema.access$400(Schema.java:92)

      at shaded.parquet.org.apache.avro.Schema$Field.<init>(Schema.java:556)

      at shaded.parquet.org.apache.avro.Schema$Field.<init>(Schema.java:595)

      at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:295)

      at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:279)

      at org.apache.parquet.cli.util.Schemas.fromParquet(Schemas.java:89)

      at org.apache.parquet.cli.BaseCommand.getAvroSchema(BaseCommand.java:405)

      at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:66)

      at org.apache.parquet.cli.Main.run(Main.java:163)

      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)

      at org.apache.parquet.cli.Main.main(Main.java:193)

      the data set in question is: https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-en/tree/main/data

      Attachments

        1. image-2023-11-16-21-40-07-628.png
          315 kB
          Jiashen Zhang

        Activity

          People

            Unassigned Unassigned
            remyleone Rémy Léone
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: