Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.4.6
-
None
-
None
-
CDH 5.9
Description
Oracle clustered tables not supported by oraoop (direct Oracle mode)
------------------------------------------------------------------------
More about clustered tables here https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_5001.htm)
While ingesting data using sqoop's direct mode, we noticed that it does not support these tables and would error out with error, stating that there are no records in the source table.
The issue is happening because sqoop is unable to determine database extents for such tables and hence chunks of data for splits in src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java
This code would generate SQL similar to below and dba_extents table would not have segment_name named as object name.
For example in our case, ENCOUNTER table is clustered and dba_extents segment_name would differ from object_name.
This query below would not return any data:
SELECT * FROM dba_objects o, dba_extents e
WHERE o.owner = 'V500'
AND o.object_name = 'ENCOUNTER'
AND e.owner = 'V500'
AND e.segment_name = 'ENCOUNTER'
AND (o.subobject_name = e.partition_name
OR (o.subobject_name IS NULL
AND e.partition_name IS NULL))
;
but this query will (note how Oracle DB named object_name and segment_name):
Here is a modified query that returns data:
SELECT * FROM dba_objects o, dba_extents e
WHERE o.owner = 'V500'
AND o.object_name = 'CLU_ENCOUNTER0077'
AND e.owner = 'V500'
AND e.segment_name = 'CLU_ENCOUNTER0077'
AND (o.subobject_name = e.partition_name
OR (o.subobject_name IS NULL
AND e.partition_name IS NULL))
;
sqoop import --direct --connect ....
— cut from log —
17/05/01 09:35:09 DEBUG oracle.OraOopOracleQueries: getOracleDataChunksExtent() SQL Query =
SELECT data_object_id, file_id, relative_fno, file_batch, MIN (start_block_id) start_block_id, MAX (end_block_id) end_block_id, SUM (blocks) blocks FROM (SELECT o.data_object_id, e.file_id, e.relative_fno, e.block_id start_block_id, e.block_id + e.blocks - 1 end_block_id, e.blocks, CEIL ( SUM ( e.blocks) OVER (PARTITION BY o.data_object_id, e.file_id ORDER BY e.block_id ASC) / (SUM (e.blocks) OVER (PARTITION BY o.data_object_id, e.file_id) / 17)) file_batch FROM dba_extents e, dba_objects o, dba_tab_subpartitions tsp WHERE o.owner = V500 AND o.object_name = ENCOUNTER AND e.owner = V500 AND e.segment_name = ENCOUNTER AND o.owner = e.owner AND o.object_name = e.segment_name AND (o.subobject_name = e.partition_name OR (o.subobject_name IS NULL AND e.partition_name IS NULL)) AND o.owner = tsp.table_owner AND o.object_name = tsp.table_name AND o.subobject_name = tsp.subpartition_name ) GROUP BY data_object_id, file_id, relative_fno, file_batch ORDER BY data_object_id, file_id, relative_fno, file_batch
17/05/01 09:35:09 FATAL oracle.OraOopDataDrivenDBInputFormat: The table ENCOUNTER does not contain any data.
17/05/01 09:35:09 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/oracle/.staging/job_1493213922849_0970
17/05/01 09:35:09 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@2f7a2457
17/05/01 09:35:09 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: The table ENCOUNTER does not contain any data.
java.lang.RuntimeException: The table ENCOUNTER does not contain any data.
at org.apache.sqoop.manager.oracle.OraOopDataDrivenDBInputFormat.getSplits(OraOopDataDrivenDBInputFormat.java:108)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:305)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1325)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:203)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:176)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:273)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
at org.apache.sqoop.manager.oracle.OraOopConnManager.importTable(OraOopConnManager.java:284)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:507)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)