Uploaded image for project: 'Sqoop'
  1. Sqoop
  2. SQOOP-1366 Propose to add Parquet support
  3. SQOOP-1395

Potential naming conflict in Avro schema

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.6
    • Component/s: tools
    • Labels:
      None

      Description

      If you import a table "users". Sqoop will generate an entity class named "users.java". The class will be compiled, submitted and used by a mapreduce job. If the target file format is Avro or Parquet, an Avro schema will be generated as well. According to Avro specification, the entity class is described as "record", the name of the "record" is "users".

      For Parquet file format handling, we use the Kite SDK to manage Parquet file reading and writing with minimal efforts. Kite requires an Avro schema and all data records to be packed into GenericRecord instances. There will be a problem here. Kite will read the schema first and try to instantiate a record regarding its name. In this case, Kite will try to instantiate a "users" class. Unfortunately, there is a "users.java" out there. This will cause mapreduce job fail.

      The patch proposes to change the AvroSchemaGenerator class. Record name will have a prefix. In this example, the record name of "users.java" will be changed to "sqoop_import_users".

        Attachments

        1. SQOOP-1395.patch
          2 kB
          Qian Xu

          Issue Links

            Activity

              People

              • Assignee:
                stanleyxu2005 Qian Xu
                Reporter:
                stanleyxu2005 Qian Xu
              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: