Uploaded image for project: 'Sqoop'
  1. Sqoop
  2. SQOOP-2750

Support --fields-terminated-by value greater than 127 when using --hive-import

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.99.6
    • Fix Version/s: None
    • Component/s: hive-integration
    • Labels:

      Description

      Using a fields-terminated-by value greater than 127 builds a file with the correct delimiter but causes an exception when included with hive-import. The relevant code is in src/java/apache/sqoop/hive/TableDefWriter.java:
      https://github.com/apache/sqoop/blob/f19e2a523579db8c28a96febfd3cf35a5d58adc6/src/java/org/apache/sqoop/hive/TableDefWriter.java#L278-L300

      The assumption is only half true. Hive only supports delimiters up to 127 in octal form, but it also supports delimiters up to 255 in signed character form (two's compliment).
      For example, a fields-terminated-by value '\0376' (ASCII 254) is valid for sqoop, but when used in a Hive table definition it should be converted to '-2' (with single quotes).

      I suggest rejecting delimiters over 255, converting delimiters over 127 to two's compliment signed characters, and leaving delimiters at or below 127 as octal.

      (Work estimate inflated to account of number of tests that may need to be modified.)

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mtruscello Marcus Truscello
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 4h
                4h
                Remaining:
                Remaining Estimate - 4h
                4h
                Logged:
                Time Spent - Not Specified
                Not Specified