Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.0.0, 1.4.5, 1.99.7
-
None
Description
It would be great if sqoop export and sqoop import would support
exporting and importing nested collections natively.
For example, Oracle supports nested data types directly, e.g.:
http://www.orafaq.com/wiki/NESTED_TABLE
Hive/Impala/Spark also support nested collection data types, i.e. in Hive -
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-ComplexTypeConstructors
We currently have to export base table, and then create for each nested collection staging tables in Hive, then sqoop all of them separately.
A)
- At best, it would be great if sqoop could export base table and nested collections at once;
B)
- As a minimum, it would be awesome if sqoop could at least export a given one nested collection plus a few columns from the base table, e.g:
Let's say we have following table
TABLE client_transactions
(
.. .
, client_int int4
, first_name text
, transactions array<struct<trans_date:timestamp,trans_amount:int4>>
, web_vists array<struct<page:text,visits:int4>>
, .. .
)
stored as parquet;
Then for the "B" functionality (minimal support for nested structures), we could call sqoop as e.g.:
sqoop export ... \
--nested-collection transactions
--columns client_int
so it would flatten nested collection "transactions" to a set of following columns: client_int, trans_date, trans_amount and sqoop as a regular table 's dataset.
Attachments
Issue Links
- is related to
-
SQOOP-1709 Column Type enhancements for complex types
- Resolved
-
SQOOP-1350 Sqoop2: Support all supported data types in the CSV Intermediate Data Format implementation
- Resolved
- relates to
-
SQOOP-2935 Support complex types with HCatatog integration
- Open