Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3464

Show Create Table with Unusual Delimiters Incorrect

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: Impala 2.5.0
    • Fix Version/s: Impala 3.0
    • Component/s: Frontend
    • Labels:

      Description

      When dealing with text tables I tend to use ASCII delimited text if possible: https://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text

      However, when you do this impala gives the wrong show create table output for delimiters that aren't printable ascii characters:

      [hostname:21000] > create table u_alanj.test_ascii_delimited(a string, b string) row format delimited fields terminated by '\u0031' lines terminated by '\030';
      Query: create table u_alanj.test_ascii_delimited(a string, b string) row format delimited fields terminated by '\u0031' lines terminated by '\030'
      
      Fetched 0 row(s) in 10.59s
      [hostname:21000] > show create table u_alanj.test_ascii_delimited;
      Query: show create table u_alanj.test_ascii_delimited
      +-------------------------------------------------------------------------------------------------------+
      | result                                                                                                |
      +-------------------------------------------------------------------------------------------------------+
      | CREATE TABLE u_alanj.test_ascii_delimited (                                                           |
      |   a STRING,                                                                                           |
      |   b STRING                                                                                            |
      | )                                                                                                     |
      | ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u001F' LINES TERMINATED BY '\u0018'                       |
      | WITH SERDEPROPERTIES ('line.delim'='\u0018', 'field.delim'='\u001F', 'serialization.format'='\u001F') |
      | STORED AS TEXTFILE                                                                                    |
      | LOCATION 'hdfs://nameservice1/user/hive/warehouse/u_alanj.db/test_ascii_delimited'                    |
      | TBLPROPERTIES ('transient_lastDdlTime'='1462204213')                                                  |
      +-------------------------------------------------------------------------------------------------------+
      Fetched 1 row(s) in 4.94s
      

      If you put the statement from show create table into a new create table statement with the same data, it will not work.

      [hostname:21000] > create table u_alanj.test_ascii_delimited2(a string, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u001F' LINES TERMINATED BY '\u0018';
      Query: create table u_alanj.test_ascii_delimited2(a string, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u001F' LINES TERMINATED BY '\u0018'
      
      Fetched 0 row(s) in 7.83s
      [hostname:21000] > show create table u_alanj.test_ascii_delimited2;
      Query: show create table u_alanj.test_ascii_delimited2
      +-------------------------------------------------------------------------------------------------------+
      | result                                                                                                |
      +-------------------------------------------------------------------------------------------------------+
      | CREATE TABLE u_alanj.test_ascii_delimited2 (                                                          |
      |   a STRING,                                                                                           |
      |   b STRING                                                                                            |
      | )                                                                                                     |
      | ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0019' LINES TERMINATED BY '\u0012'                       |
      | WITH SERDEPROPERTIES ('line.delim'='\u0012', 'field.delim'='\u0019', 'serialization.format'='\u0019') |
      | STORED AS TEXTFILE                                                                                    |
      | LOCATION 'hdfs://nameservice1/user/hive/warehouse/u_alanj.db/test_ascii_delimited2'                   |
      | TBLPROPERTIES ('transient_lastDdlTime'='1462204481')                                                  |
      +-------------------------------------------------------------------------------------------------------+
      Fetched 1 row(s) in 3.17s
      

      In case it matters, Hive also does this wrong.

      PS: Jira doesn't have component marked as required, but wouldn't let me create this without a component.

        Attachments

          Activity

            People

            • Assignee:
              aholley Adam Holley
              Reporter:
              alanj_impala_5a78 Alan Jackoway
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: