Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.4.6
-
None
-
None
-
sqoop 1.4.6 hadoop 2.6.0-amzn-1
Description
Sqoop doesn't honor UTF-8 chars when import --direct on a MySQL table.
Here is the key comma delimited output from attached example script w/o and w/ --direct:
1,Τη γλώσσα,"/fox/\jumps 1,���� ������������,"/fox/\jumps
I looked over sqoop --verbose output and hadoop logs but can't find anything suspicious.
As an aside run the example script w/ --mysql-delimiters to get this puzzling comma delimited output:
1,Τη γλώσσα,"/fox/\\jumps 1,'���� ������������','\"/fox/\\jumps'
Note, the difference between the text fields containing the word "fox." The output should be identical but they are quoted differently.
Attached are scripts to create the MySQL utest example table and bash script I used to demonstrate the --direct problem.
Environment
$ sqoop version Warning: /home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 15/10/20 17:28:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 Sqoop 1.4.6 git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25 Compiled by root on Mon Apr 27 14:38:36 CST 2015 $ hadoop version Hadoop 2.6.0-amzn-1 Subversion git@aws157git.com:/pkg/Aws157BigTop -r edd5a97db145470a8723dde24f38c83724e0959c Compiled by ec2-user on 2015-09-25T14:59Z Compiled with protoc 2.5.0 From source with checksum 7beeae31f3c4554b23d92f1e63dc85 This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-amzn-1.jar