Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2998

impala-shell -B and --output_delimiter does not work if string contains delimiter or TABs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • Impala 2.3.0
    • None
    • Clients

    Description

      See the test case below:

      impala-shell -q "DROP TABLE IF EXISTS tabtest";
      
      impala-shell -q "CREATE TABLE tabtest(col1 string,col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED by ','";
      
      impala-shell -q 'INSERT OVERWRITE TABLE tabtest VALUES ("test", "\t\t\tTest"), ("test2", "Test\t\t\tTest"), ("test3", "test\tTest"), ("test4", "test,Test");';
      
      impala-shell -o out.csv -q "SELECT * FROM tabtest" --output_delimiter="," -B
      
      cat out.csv
      

      The output looks like below:

      test,,,,Test
      test2,Test,,,Test
      test3,test,Test
      test4,test
      

      So two issues I can see here:

      1. When strings contain TABs, all tabs will be replaced by delimiter
      2. If string contains delimiter, the data after the delimiter is lost (see "test4"). According to doc: http://www.cloudera.com/documentation/enterprise/latest/topics/impala_shell_options.html,

      If an output value contains the delimiter character, that field is quoted and/or escaped

      By looking at the underlining data:

      hadoop fs -cat /user/hive/warehouse/tabtest/ba44046cba4c7c80-d6c4c08afd8c0cb0_1055158928_data.0.
      test,			Test
      test2,Test			Test
      test3,test	Test
      test4,test,Test
      

      Data is not stored properly, as they should be in quotes for those strings that contains delimter characters.

      This is both data write as well as read/parse issue.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ericlin Eric Lin
              Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: