Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Sqoop currently has a --query option for import but not for export.
It would be nice if the export --query option supporting HiveQL could be added as users currently have to create a temporary table and then export that as a two step process with a full disk re-write of all the to-be-exported data to a new table before the sqoop export command is started.
Since Sqoop executes a distributed map-only job, I believe certain queries such as joins that have to be done via a reduce phase will yield little performance improvement due to the map->reduce intermediate writes needing to be written anyway. However we could save on the final reduce phase writes and also turn this in to a more convenient one step instead two step process.