Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7056

Changing Text Delimiter Does Not Work

    XMLWordPrintableJSON

Details

    • ghx-label-7

    Description

      The wording on https://impala.apache.org/docs/build/html/topics/impala_alter_table.html makes it seem like you can change the delimiter of text tables after they are created.

      I did the following to simulate a table that needed to switch between comma and pipe delimited:

      hadoop fs -mkdir /user/alanj
      hadoop fs -mkdir /user/alanj/test_delim
      echo "A,B|C" > delim.txt
      hadoop fs -put delim.txt /user/alanj/test_delim
      

      Then created in impala and tried to change delimiters:

      > create external table default.alanj_test_delim(A string, B string) ROW FORMAT DELIMITED FIELDS TERMINATED BY "," LOCATION '/user/alanj/test_delim';
      > select * from default.alanj_test_delim;
      Query: select * from default.alanj_test_delim
      +---+-----+
      | a | b   |
      +---+-----+
      | A | B|C |
      +---+-----+
      > alter table default.alanj_test_delim set SERDEPROPERTIES ('serialization.format'='|', 'field.delim'='|');
      > select * from default.alanj_test_delim;
      +---+-----+
      | a | b   |
      +---+-----+
      | A | B|C |
      +---+-----+
      > show create table default.alanj_test_delim;
      +----------------------------------------------------------------------------------------------------------------------+
      | result                                                                                                               |
      +----------------------------------------------------------------------------------------------------------------------+
      | CREATE EXTERNAL TABLE default.alanj_test_delim (                                                                     |
      |   a STRING,                                                                                                          |
      |   b STRING                                                                                                           |
      | )                                                                                                                    |
      | ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'                                                                        |
      | WITH SERDEPROPERTIES ('field.delim'='|', 'serialization.format'='|')                                                 |
      | STORED AS TEXTFILE                                                                                                   |
      | LOCATION 'hdfs://namenode:8020/user/alanj/test_delim'                                  |
      | TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='false', 'numFiles'='0', 'numRows'='-1', 'rawDataSize'='-1', 'totalSize'='0') |
      +----------------------------------------------------------------------------------------------------------------------+
      

      So it shows the right serdeproperties, but impala doesn't actually use them to read the data.

      If you then insert data (as the docs suggest), it writes that data with the new delimiter:

      > insert into default.alanj_test_delim values('D', 'E,F');
      > select * from alanj_test_delim;
      +-----+-----+
      | a   | b   |
      +-----+-----+
      | A,B | C   |
      | D   | E,F |
      +-----+-----+
      # hadoop fs -cat /user/alanj/test_delim/a54bb0ec14646492-a738811400000000_1498283208_data.0.
      D|E,F
      

      Attachments

        Activity

          People

            arodoni Alexandra Rodoni
            jackowaya Alan Jackoway
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: