Sqoop
  1. Sqoop
  2. SQOOP-319

The --hive-drop-import-delims option should accept a replacement string

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.3.0
    • Fix Version/s: 1.4.0-incubating
    • Component/s: hive-integration
    • Labels:
      None

      Description

      When importing data into hive, you have the option of dropping the hive delimiters in data fields. It would be more useful to replace the delimiters with a user defined string. Often times the dropped delimiters (like \n) are separating words. If I want to split on white space in my hive queries, I'll now get two words merged together. A more desirable behavior would be to replace it with a space. Making it user configurable will give the most flexibility.

      1. SQOOP-319-1.patch
        10 kB
        Joey Echeverria
      2. SQOOP-319-2.patch
        13 kB
        Joey Echeverria

        Activity

        Hide
        Joey Echeverria added a comment -

        I added a new option, --hive-delims-replacement, which lets you pass in a replacement string. I did it with a new option to remain backwards compatible with the existing interface. I added a test for the new option.

        Show
        Joey Echeverria added a comment - I added a new option, --hive-delims-replacement, which lets you pass in a replacement string. I did it with a new option to remain backwards compatible with the existing interface. I added a test for the new option.
        Hide
        Joey Echeverria added a comment -

        I also tested the feature by hand. It works, but I found a bug when doing --direct (at least with MySQL). It doesn't end up calling the hiveStringDropDelims() function. Some other kind of escaping is going on. I'll file that as a separate JIRA.

        Show
        Joey Echeverria added a comment - I also tested the feature by hand. It works, but I found a bug when doing --direct (at least with MySQL). It doesn't end up calling the hiveStringDropDelims() function. Some other kind of escaping is going on. I'll file that as a separate JIRA.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1598/
        -----------------------------------------------------------

        Review request for Sqoop.

        Summary
        -------

        I added a new option, --hive-delims-replacement, which lets you pass in a replacement string. I did it with a new option to remain backwards compatible with the existing interface.

        This addresses bug SQOOP-319.
        https://issues.apache.org/jira/browse/SQOOP-319

        Diffs


        src/docs/user/hive-args.txt 7e6b7a0
        src/docs/user/hive.txt 059d7cb
        src/java/com/cloudera/sqoop/SqoopOptions.java d760d39
        src/java/com/cloudera/sqoop/lib/FieldFormatter.java 41536e1
        src/java/com/cloudera/sqoop/orm/ClassWriter.java dd3994e
        src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 8f629f1
        src/test/com/cloudera/sqoop/hive/TestHiveImport.java 35de2fd
        testdata/hive/scripts/fieldWithNewlineReplacementImport.q PRE-CREATION

        Diff: https://reviews.apache.org/r/1598/diff

        Testing
        -------

        I added a unit test for the new option. I also tested the feature by hand. It works, but I found a bug when doing --direct (at least with MySQL). It doesn't end up calling the hiveStringDropDelims() function. Some other kind of escaping is going on. I'll file that as a separate JIRA.

        Thanks,

        Joey

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1598/ ----------------------------------------------------------- Review request for Sqoop. Summary ------- I added a new option, --hive-delims-replacement, which lets you pass in a replacement string. I did it with a new option to remain backwards compatible with the existing interface. This addresses bug SQOOP-319 . https://issues.apache.org/jira/browse/SQOOP-319 Diffs src/docs/user/hive-args.txt 7e6b7a0 src/docs/user/hive.txt 059d7cb src/java/com/cloudera/sqoop/SqoopOptions.java d760d39 src/java/com/cloudera/sqoop/lib/FieldFormatter.java 41536e1 src/java/com/cloudera/sqoop/orm/ClassWriter.java dd3994e src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 8f629f1 src/test/com/cloudera/sqoop/hive/TestHiveImport.java 35de2fd testdata/hive/scripts/fieldWithNewlineReplacementImport.q PRE-CREATION Diff: https://reviews.apache.org/r/1598/diff Testing ------- I added a unit test for the new option. I also tested the feature by hand. It works, but I found a bug when doing --direct (at least with MySQL). It doesn't end up calling the hiveStringDropDelims() function. Some other kind of escaping is going on. I'll file that as a separate JIRA. Thanks, Joey
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1598/#review1579
        -----------------------------------------------------------

        Thanks for the patch Joey. A high-level suggestion - please add validation that stops users from using both the options of --hive-drop-import-delims and the one you are introducing as they are logically incompatible.

        A refactoring suggestion and minor checkstyle comments below.

        src/java/com/cloudera/sqoop/lib/FieldFormatter.java
        <https://reviews.apache.org/r/1598/#comment3565>

        It will be better to create another method that is called hiveStringReplaceDelims(String,String) which is called by the original method with replacement string set to empty string.

        src/java/com/cloudera/sqoop/orm/ClassWriter.java
        <https://reviews.apache.org/r/1598/#comment3566>

        Longer than 80.

        src/java/com/cloudera/sqoop/orm/ClassWriter.java
        <https://reviews.apache.org/r/1598/#comment3567>

        Longer than 80.

        src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java
        <https://reviews.apache.org/r/1598/#comment3569>

        Longer than 80.

        src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java
        <https://reviews.apache.org/r/1598/#comment3568>

        Longer than 80.

        src/test/com/cloudera/sqoop/hive/TestHiveImport.java
        <https://reviews.apache.org/r/1598/#comment3570>

        Longer than 80.

        src/test/com/cloudera/sqoop/hive/TestHiveImport.java
        <https://reviews.apache.org/r/1598/#comment3571>

        Longer than 80.

        • Arvind

        On 2011-08-19 18:52:15, Joey Echeverria wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/1598/

        -----------------------------------------------------------

        (Updated 2011-08-19 18:52:15)

        Review request for Sqoop.

        Summary

        -------

        I added a new option, --hive-delims-replacement, which lets you pass in a replacement string. I did it with a new option to remain backwards compatible with the existing interface.

        This addresses bug SQOOP-319.

        https://issues.apache.org/jira/browse/SQOOP-319

        Diffs

        -----

        src/docs/user/hive-args.txt 7e6b7a0

        src/docs/user/hive.txt 059d7cb

        src/java/com/cloudera/sqoop/SqoopOptions.java d760d39

        src/java/com/cloudera/sqoop/lib/FieldFormatter.java 41536e1

        src/java/com/cloudera/sqoop/orm/ClassWriter.java dd3994e

        src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 8f629f1

        src/test/com/cloudera/sqoop/hive/TestHiveImport.java 35de2fd

        testdata/hive/scripts/fieldWithNewlineReplacementImport.q PRE-CREATION

        Diff: https://reviews.apache.org/r/1598/diff

        Testing

        -------

        I added a unit test for the new option. I also tested the feature by hand. It works, but I found a bug when doing --direct (at least with MySQL). It doesn't end up calling the hiveStringDropDelims() function. Some other kind of escaping is going on. I'll file that as a separate JIRA.

        Thanks,

        Joey

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1598/#review1579 ----------------------------------------------------------- Thanks for the patch Joey. A high-level suggestion - please add validation that stops users from using both the options of --hive-drop-import-delims and the one you are introducing as they are logically incompatible. A refactoring suggestion and minor checkstyle comments below. src/java/com/cloudera/sqoop/lib/FieldFormatter.java < https://reviews.apache.org/r/1598/#comment3565 > It will be better to create another method that is called hiveStringReplaceDelims(String,String) which is called by the original method with replacement string set to empty string. src/java/com/cloudera/sqoop/orm/ClassWriter.java < https://reviews.apache.org/r/1598/#comment3566 > Longer than 80. src/java/com/cloudera/sqoop/orm/ClassWriter.java < https://reviews.apache.org/r/1598/#comment3567 > Longer than 80. src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java < https://reviews.apache.org/r/1598/#comment3569 > Longer than 80. src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java < https://reviews.apache.org/r/1598/#comment3568 > Longer than 80. src/test/com/cloudera/sqoop/hive/TestHiveImport.java < https://reviews.apache.org/r/1598/#comment3570 > Longer than 80. src/test/com/cloudera/sqoop/hive/TestHiveImport.java < https://reviews.apache.org/r/1598/#comment3571 > Longer than 80. Arvind On 2011-08-19 18:52:15, Joey Echeverria wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1598/ ----------------------------------------------------------- (Updated 2011-08-19 18:52:15) Review request for Sqoop. Summary ------- I added a new option, --hive-delims-replacement, which lets you pass in a replacement string. I did it with a new option to remain backwards compatible with the existing interface. This addresses bug SQOOP-319 . https://issues.apache.org/jira/browse/SQOOP-319 Diffs ----- src/docs/user/hive-args.txt 7e6b7a0 src/docs/user/hive.txt 059d7cb src/java/com/cloudera/sqoop/SqoopOptions.java d760d39 src/java/com/cloudera/sqoop/lib/FieldFormatter.java 41536e1 src/java/com/cloudera/sqoop/orm/ClassWriter.java dd3994e src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 8f629f1 src/test/com/cloudera/sqoop/hive/TestHiveImport.java 35de2fd testdata/hive/scripts/fieldWithNewlineReplacementImport.q PRE-CREATION Diff: https://reviews.apache.org/r/1598/diff Testing ------- I added a unit test for the new option. I also tested the feature by hand. It works, but I found a bug when doing --direct (at least with MySQL). It doesn't end up calling the hiveStringDropDelims() function. Some other kind of escaping is going on. I'll file that as a separate JIRA. Thanks, Joey
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1598/
        -----------------------------------------------------------

        (Updated 2011-08-23 23:01:05.651698)

        Review request for Sqoop.

        Changes
        -------

        I added a hiveStringReplaceDelims() method and implemented hiveStringDropDelims() by calling that method. I added validation to throw an error if both --hive-drop-import-delims and --hive-delims-replacement are used. I also fixed the checkstyle issues that you found.

        I added a test case for the validation code and also did manual testing of the feature.

        Summary
        -------

        I added a new option, --hive-delims-replacement, which lets you pass in a replacement string. I did it with a new option to remain backwards compatible with the existing interface.

        This addresses bug SQOOP-319.
        https://issues.apache.org/jira/browse/SQOOP-319

        Diffs (updated)


        src/docs/user/hive-args.txt 7e6b7a0
        src/docs/user/hive.txt 059d7cb
        src/java/com/cloudera/sqoop/SqoopOptions.java d760d39
        src/java/com/cloudera/sqoop/lib/FieldFormatter.java 41536e1
        src/java/com/cloudera/sqoop/orm/ClassWriter.java dd3994e
        src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 8f629f1
        src/java/com/cloudera/sqoop/tool/ImportTool.java 66e60bd
        src/test/com/cloudera/sqoop/hive/TestHiveImport.java 35de2fd
        testdata/hive/scripts/fieldWithNewlineReplacementImport.q PRE-CREATION

        Diff: https://reviews.apache.org/r/1598/diff

        Testing
        -------

        I added a unit test for the new option. I also tested the feature by hand. It works, but I found a bug when doing --direct (at least with MySQL). It doesn't end up calling the hiveStringDropDelims() function. Some other kind of escaping is going on. I'll file that as a separate JIRA.

        Thanks,

        Joey

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1598/ ----------------------------------------------------------- (Updated 2011-08-23 23:01:05.651698) Review request for Sqoop. Changes ------- I added a hiveStringReplaceDelims() method and implemented hiveStringDropDelims() by calling that method. I added validation to throw an error if both --hive-drop-import-delims and --hive-delims-replacement are used. I also fixed the checkstyle issues that you found. I added a test case for the validation code and also did manual testing of the feature. Summary ------- I added a new option, --hive-delims-replacement, which lets you pass in a replacement string. I did it with a new option to remain backwards compatible with the existing interface. This addresses bug SQOOP-319 . https://issues.apache.org/jira/browse/SQOOP-319 Diffs (updated) src/docs/user/hive-args.txt 7e6b7a0 src/docs/user/hive.txt 059d7cb src/java/com/cloudera/sqoop/SqoopOptions.java d760d39 src/java/com/cloudera/sqoop/lib/FieldFormatter.java 41536e1 src/java/com/cloudera/sqoop/orm/ClassWriter.java dd3994e src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 8f629f1 src/java/com/cloudera/sqoop/tool/ImportTool.java 66e60bd src/test/com/cloudera/sqoop/hive/TestHiveImport.java 35de2fd testdata/hive/scripts/fieldWithNewlineReplacementImport.q PRE-CREATION Diff: https://reviews.apache.org/r/1598/diff Testing ------- I added a unit test for the new option. I also tested the feature by hand. It works, but I found a bug when doing --direct (at least with MySQL). It doesn't end up calling the hiveStringDropDelims() function. Some other kind of escaping is going on. I'll file that as a separate JIRA. Thanks, Joey
        Hide
        Joey Echeverria added a comment -

        Updated patch based on review board feedback.

        Show
        Joey Echeverria added a comment - Updated patch based on review board feedback.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1598/#review1619
        -----------------------------------------------------------

        Ship it!

        +1

        • Arvind

        On 2011-08-23 23:01:05, Joey Echeverria wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/1598/

        -----------------------------------------------------------

        (Updated 2011-08-23 23:01:05)

        Review request for Sqoop.

        Summary

        -------

        I added a new option, --hive-delims-replacement, which lets you pass in a replacement string. I did it with a new option to remain backwards compatible with the existing interface.

        This addresses bug SQOOP-319.

        https://issues.apache.org/jira/browse/SQOOP-319

        Diffs

        -----

        src/docs/user/hive-args.txt 7e6b7a0

        src/docs/user/hive.txt 059d7cb

        src/java/com/cloudera/sqoop/SqoopOptions.java d760d39

        src/java/com/cloudera/sqoop/lib/FieldFormatter.java 41536e1

        src/java/com/cloudera/sqoop/orm/ClassWriter.java dd3994e

        src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 8f629f1

        src/java/com/cloudera/sqoop/tool/ImportTool.java 66e60bd

        src/test/com/cloudera/sqoop/hive/TestHiveImport.java 35de2fd

        testdata/hive/scripts/fieldWithNewlineReplacementImport.q PRE-CREATION

        Diff: https://reviews.apache.org/r/1598/diff

        Testing

        -------

        I added a unit test for the new option. I also tested the feature by hand. It works, but I found a bug when doing --direct (at least with MySQL). It doesn't end up calling the hiveStringDropDelims() function. Some other kind of escaping is going on. I'll file that as a separate JIRA.

        Thanks,

        Joey

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1598/#review1619 ----------------------------------------------------------- Ship it! +1 Arvind On 2011-08-23 23:01:05, Joey Echeverria wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1598/ ----------------------------------------------------------- (Updated 2011-08-23 23:01:05) Review request for Sqoop. Summary ------- I added a new option, --hive-delims-replacement, which lets you pass in a replacement string. I did it with a new option to remain backwards compatible with the existing interface. This addresses bug SQOOP-319 . https://issues.apache.org/jira/browse/SQOOP-319 Diffs ----- src/docs/user/hive-args.txt 7e6b7a0 src/docs/user/hive.txt 059d7cb src/java/com/cloudera/sqoop/SqoopOptions.java d760d39 src/java/com/cloudera/sqoop/lib/FieldFormatter.java 41536e1 src/java/com/cloudera/sqoop/orm/ClassWriter.java dd3994e src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 8f629f1 src/java/com/cloudera/sqoop/tool/ImportTool.java 66e60bd src/test/com/cloudera/sqoop/hive/TestHiveImport.java 35de2fd testdata/hive/scripts/fieldWithNewlineReplacementImport.q PRE-CREATION Diff: https://reviews.apache.org/r/1598/diff Testing ------- I added a unit test for the new option. I also tested the feature by hand. It works, but I found a bug when doing --direct (at least with MySQL). It doesn't end up calling the hiveStringDropDelims() function. Some other kind of escaping is going on. I'll file that as a separate JIRA. Thanks, Joey
        Hide
        Arvind Prabhakar added a comment -

        Patch committed. Thanks Joey!

        Show
        Arvind Prabhakar added a comment - Patch committed. Thanks Joey!
        Hide
        Hudson added a comment -

        Integrated in Sqoop-jdk-1.6 #17 (See https://builds.apache.org/job/Sqoop-jdk-1.6/17/)
        SQOOP-319. Support for replacing Hive delimiters.

        (Joey Echeverria via Arvind Prabhakar)

        arvind : http://svn.apache.org/viewvc/?view=rev&rev=1161382
        Files :

        • /incubator/sqoop/trunk/src/docs/user/hive.txt
        • /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/SqoopOptions.java
        • /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/lib/FieldFormatter.java
        • /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java
        • /incubator/sqoop/trunk/src/docs/user/hive-args.txt
        • /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/orm/ClassWriter.java
        • /incubator/sqoop/trunk/src/test/com/cloudera/sqoop/hive/TestHiveImport.java
        • /incubator/sqoop/trunk/testdata/hive/scripts/fieldWithNewlineReplacementImport.q
        • /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/tool/ImportTool.java
        Show
        Hudson added a comment - Integrated in Sqoop-jdk-1.6 #17 (See https://builds.apache.org/job/Sqoop-jdk-1.6/17/ ) SQOOP-319 . Support for replacing Hive delimiters. (Joey Echeverria via Arvind Prabhakar) arvind : http://svn.apache.org/viewvc/?view=rev&rev=1161382 Files : /incubator/sqoop/trunk/src/docs/user/hive.txt /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/SqoopOptions.java /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/lib/FieldFormatter.java /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java /incubator/sqoop/trunk/src/docs/user/hive-args.txt /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/orm/ClassWriter.java /incubator/sqoop/trunk/src/test/com/cloudera/sqoop/hive/TestHiveImport.java /incubator/sqoop/trunk/testdata/hive/scripts/fieldWithNewlineReplacementImport.q /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/tool/ImportTool.java
        Hide
        Anitha added a comment -

        Hi Team,
        This option --hive-delims-replacement is not working for Oracle CLOB data with 0D 0A characters. Can you please check on this.

        Thanks,
        AnithaKS

        Show
        Anitha added a comment - Hi Team, This option --hive-delims-replacement is not working for Oracle CLOB data with 0D 0A characters. Can you please check on this. Thanks, AnithaKS

          People

          • Assignee:
            Joey Echeverria
            Reporter:
            Joey Echeverria
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development