It would be great if sqoop export and sqoop import would support
exporting and importing nested collections natively.
For example, Oracle supports nested data types directly, e.g.:
Hive/Impala/Spark also support nested collection data types, i.e. in Hive -
We currently have to export base table, and then create for each nested collection staging tables in Hive, then sqoop all of them separately.
- At best, it would be great if sqoop could export base table and nested collections at once;
- As a minimum, it would be awesome if sqoop could at least export a given one nested collection plus a few columns from the base table, e.g:
Let's say we have following table
, client_int int4
, first_name text
, transactions array<struct<trans_date:timestamp,trans_amount:int4>>
, web_vists array<struct<page:text,visits:int4>>
, .. .
stored as parquet;
Then for the "B" functionality (minimal support for nested structures), we could call sqoop as e.g.:
sqoop export ... \
so it would flatten nested collection "transactions" to a set of following columns: client_int, trans_date, trans_amount and sqoop as a regular table 's dataset.