Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-1780

Avro/Parquet schemas can't handle Sqoop-generated non-alphanumeric column names

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.4.5
    • None
    • None
    • None

    Description

      I was importing a MySQL table that had columns that started with a number (1QP, 2QP, etc.). It looks like Sqoop appends an underscore on the front of those names to make them compatible with Hive, but Parquet/Avro schemas can't handle the non-alphanumeric value in the name of a field (or at least, at the start of it), throwing the following exception:

      java.lang.IllegalStateException: Deprecated: field names are not alphanumeric (plus '_'): sqoop_import_team._1QP, sqoop_import_team._2QP, sqoop_import_team._3QP, sqoop_import_team._4QP
      	at com.google.common.base.Preconditions.checkState(Preconditions.java:172)
      	at org.kitesdk.data.spi.Compatibility.checkSchema(Compatibility.java:119)
      	at org.kitesdk.data.spi.Compatibility.checkDescriptor(Compatibility.java:133)
      	at org.kitesdk.data.spi.hive.HiveManagedMetadataProvider.create(HiveManagedMetadataProvider.java:40)
      	at org.kitesdk.data.spi.hive.HiveManagedDatasetRepository.create(HiveManagedDatasetRepository.java:76)
      	at org.kitesdk.data.Datasets.create(Datasets.java:200)
      	at org.kitesdk.data.Datasets.create(Datasets.java:240)
      	at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:81)
      	at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:70)
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            jwills Josh Wills
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: