Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1598

Make convert-csv work with the input filename which starts with a period or an numeric

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • parquet-cli

    Description

      I ran parquet-cli's convert-csv with an input file which name starts with a numeric character without --schema option and got the following error:

      $ java -cp 'target/*:target/dependency/*' org.apache.parquet.cli.Main convert-csv 0sample.csv -o sample.parquet
      Unknown error
      shaded.parquet.org.apache.avro.SchemaParseException: Illegal initial character: 0sample
      	at shaded.parquet.org.apache.avro.Schema.validateName(Schema.java:1498)
      	at shaded.parquet.org.apache.avro.Schema.access$200(Schema.java:86)
      	at shaded.parquet.org.apache.avro.Schema$Name.<init>(Schema.java:645)
      	at shaded.parquet.org.apache.avro.Schema.createRecord(Schema.java:182)
      	at shaded.parquet.org.apache.avro.SchemaBuilder$RecordBuilder.fields(SchemaBuilder.java:1805)
      	at org.apache.parquet.cli.csv.AvroCSV.inferSchemaInternal(AvroCSV.java:158)
      	at org.apache.parquet.cli.csv.AvroCSV.inferNullableSchema(AvroCSV.java:78)
      	at org.apache.parquet.cli.commands.ConvertCSVCommand.run(ConvertCSVCommand.java:160)
      	at org.apache.parquet.cli.Main.run(Main.java:147)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      	at org.apache.parquet.cli.Main.main(Main.java:177)
      

      This is because that convert-csv uses the input file name as the name for the output schema, while Avro requires its schema name to match the regex pattern [A-Za-z_][A-Za-z0-9_]*.
      So users have to change the input file name or use the --schema option explicitly, but it's not so obvious from the error message.
      It'd be nice if the message were improved, or the schema name were automatically replaced with valid characters to avoid this problem.

      Attachments

        Issue Links

          Activity

            People

              sekikn Kengo Seki
              sekikn Kengo Seki
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: