Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-2921

support for nested data types

Add voteWatch issue
    XMLWordPrintableJSON

Details

    Description

      It would be great if sqoop export and sqoop import would support
      exporting and importing nested collections natively.

      For example, Oracle supports nested data types directly, e.g.:
      http://www.orafaq.com/wiki/NESTED_TABLE

      Hive/Impala/Spark also support nested collection data types, i.e. in Hive -
      https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-ComplexTypeConstructors

      We currently have to export base table, and then create for each nested collection staging tables in Hive, then sqoop all of them separately.

      A)

      • At best, it would be great if sqoop could export base table and nested collections at once;

      B)

      • As a minimum, it would be awesome if sqoop could at least export a given one nested collection plus a few columns from the base table, e.g:

      Let's say we have following table

      TABLE client_transactions
      (
      .. .
      , client_int int4
      , first_name text
      , transactions array<struct<trans_date:timestamp,trans_amount:int4>>
      , web_vists array<struct<page:text,visits:int4>>
      , .. .
      )
      stored as parquet;

      Then for the "B" functionality (minimal support for nested structures), we could call sqoop as e.g.:

      sqoop export ... \
      --nested-collection transactions
      --columns client_int

      so it would flatten nested collection "transactions" to a set of following columns: client_int, trans_date, trans_amount and sqoop as a regular table 's dataset.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Tagar Ruslan Dautkhanov

              Dates

                Created:
                Updated:

                Slack

                  Issue deployment