Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
1.4.6
-
None
-
CENTOS
SQOOP - 1.4.6
Hadoop - 2.6.0
Description
I am trying to import data from MSSQL Server to hadoop using SQOOP with a fixed number of rows in each batch using top 'N' in the query.
Below is the code :
sqoop import --connect "jdbc:sqlserver://myserverIp;database=organization" \
--connection-manager org.apache.sqoop.manager.SQLServerManager \
--driver com.microsoft.sqlserver.jdbc.SQLServerDriver \
--username hadoop --password hadoop123 \
--target-dir /user/hive/warehouse/organization.db/employees \
--query "SELECT TOP 5 [employee_id] \
,[first_name] \
,[last_name] \
,[email] \
,[phone_number] \
,[hire_date] \
,[job_id] \
,[salary] \
,[commission_pct] \
,[manager_id] \
,[department_id] \
FROM [dbo].[employees] where \$CONDITIONS" \
--m 1 --incremental append --check-column employee_id --last-value 99;
when i execute, SQOOP is fetching the upper bound value with the below query
SELECT MAX([employee_id]) FROM (SELECT TOP 5 [employee_id] ,[first_name] ,[last_name] ,[email] ,[phone_number] ,[hire_date] ,[job_id] ,[salary] ,[commission_pct] ,[manager_id] ,[department_id] FROM [dbo].[employees] where (1 = 1)) sqoop_import_query_alias
which is returning 206, this is the highest value in employees table
But I am reading only five rows so the returned value should be max of returned 5 rows that is 104 in this case. therefore this will be an issue when I run the job again.
I have posting this issue with MAX function on MSDN forum and I got some responses asking to use order by clause in the sub query.
I believe this should be a fix at SQOOP when using with MSSQL