Details
-
Bug
-
Status: Open
-
Blocker
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
-
Important
Description
if enclosed-by and escaped-by characters are both double quote (\"). This causes duplicate escapes and thus duplicate characters in douple quotes.
Example;
gcloud dataproc jobs submit hadoop --cluster=sqoop --region=europe-west4 --class=org.apache.sqoop.Sqoop --jars=$libs – import -Dmapreduce.job.user.classpath.first=true --connect=jdbc:**** --target-dir=gs://my-oracle-extract/EMPLOYEES --table=HR.EMPLOYEES --enclosed-by '\"' --escaped-by \" --fields-terminated-by '|' --null-string '' --null-non-string '' --as-textfile
causes this field; <test field " >
to enclosed and escaped by this; <"test field """"">
Which has 2 double quotes
Bigquery requires double quotes as escap char. and field should be also enclosed by " for newlines.
code should be change;
in FieldFormatter.java;
if (escapingLegal)
{ // escaping is legal. Escape any instances of the escape char itself. withEscapes = str.replace("" + escape, "" + escape + escape); }else
{ // no need to double-escape withEscapes = str; } // if we have an enclosing character, and escaping is legal, then the
// encloser must always be escaped.
if (escapingLegal) { withEscapes = withEscapes.replace("" + enclose, "" + escape + enclose); }
to this;
if (escapingLegal ) { // escaping is legal. Escape any instances of the escape char itself. withEscapes = str.replace("" + escape, "" + escape + escape); alreadyEscaped = true } else { // no need to double-escape withEscapes = str; }
// if we have an enclosing character, and escaping is legal, then the
// encloser must always be escaped.
if (escapingLegal and enclose!=escape)