Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
1.4.5
-
None
-
None
-
None
Description
In DelimiterSet there is the following comment above two option variables:
// If these next two fields are '\000', then they are ignored. private char enclosedBy; private char escapedBy;
We just found a problem with this whilst doing a Sqoop export, without setting the parameters for enclosing or escaping (i.e. they're left as default \000). Looking at the code in RecordParser it appears that although the comment says they would be ignored if set to \000 they actually aren't.
For some reason some of the records we're trying to export have \000 in a column. This is fine as long as the \000 isn't just before the delimiter.
This is fine foo\000bar|moo - two columns are exported.
This isn't fine foo\000|bar - only one column is exported.
Looking through RecordParser the problem is that our \000 character is being assumed to be an enclosing character, so it's then assuming the delimiter is part of a value. We've set enclosedBy to be \000 as a default, let's ignore it value, but then we're encountering \000 and it's being picked up.