Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1127

Impala and Hive's default TEXTFILE serialisation can't escape commas

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • Impala 1.4
    • None
    • Frontend
    • None

    Description

      I thought in the past I had put strings containing commas into an Impala CSV table and it did the right thing automatically (escaped the commas with \, since there isn't the notion of optional double quotes like in some text input formats). I tried just now with 1.4 and Impala would always interpret the comma as a separator regardless of escaping.

      [localhost:21000] > create table csv (c1 string, c2 string, c3 string) row format delimited fields terminated by "," stored as textfile
      [localhost:21000] > insert into csv values ("one","two","three"), ('double " quote',"single \' quote","and , comma");
      [localhost:21000] > select * from csv;
      -----------------------------------

      c1 c2 c3

      -----------------------------------

      one two three
      double " quote single ' quote and

      -----------------------------------

      The bottom row of c3 is truncated where the comma appeared in the input string.

      [localhost:21000] > insert overwrite csv values ("one","two","three"), ('double " quote',"single \' quote","and \, comma");
      [localhost:21000] > select * from csv;
      -----------------------------------

      c1 c2 c3

      -----------------------------------

      one two three
      double " quote single ' quote and

      -----------------------------------

      Adding a \ escape before the comma didn't help, the value is still truncated.

      Maybe the escape character is being misinterpreted and I need to double it somehow, to get the \ actually into the text file:

      [localhost:21000] > insert overwrite csv values ("one","two","three"), ('double " quote',"single \' quote","and
      , comma");
      [localhost:21000] > select * from csv;
      -----------------------------------

      c1 c2 c3

      -----------------------------------

      one two three
      double " quote single ' quote and \

      -----------------------------------

      No, the \ shows up but the comma is still treated as a separator by the query.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jrussell John Russell
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: