Uploaded image for project: 'Sqoop'
  1. Sqoop
  2. SQOOP-2921

support for nested data types

    XMLWordPrintableJSON

    Details

      Description

      It would be great if sqoop export and sqoop import would support
      exporting and importing nested collections natively.

      For example, Oracle supports nested data types directly, e.g.:
      http://www.orafaq.com/wiki/NESTED_TABLE

      Hive/Impala/Spark also support nested collection data types, i.e. in Hive -
      https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-ComplexTypeConstructors

      We currently have to export base table, and then create for each nested collection staging tables in Hive, then sqoop all of them separately.

      A)

      • At best, it would be great if sqoop could export base table and nested collections at once;

      B)

      • As a minimum, it would be awesome if sqoop could at least export a given one nested collection plus a few columns from the base table, e.g:

      Let's say we have following table

      TABLE client_transactions
      (
      .. .
      , client_int int4
      , first_name text
      , transactions array<struct<trans_date:timestamp,trans_amount:int4>>
      , web_vists array<struct<page:text,visits:int4>>
      , .. .
      )
      stored as parquet;

      Then for the "B" functionality (minimal support for nested structures), we could call sqoop as e.g.:

      sqoop export ... \
      --nested-collection transactions
      --columns client_int

      so it would flatten nested collection "transactions" to a set of following columns: client_int, trans_date, trans_amount and sqoop as a regular table 's dataset.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                Tagar Ruslan Dautkhanov
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: