Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14989

FIELDS TERMINATED BY parsing broken when delimiter is more than 1 byte

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.13.0, 0.13.1
    • 0.14.1
    • File Formats, Parser, Reader
    • None

    Description

      FIELDS TERMINATED BY parsing broken when delimiter is more than 1 byte. Delimiter starting from 2nd character becomes part of returned data. No parsed properly.

      Test case:

      CREATE external TABLE test_muldelim
      (  string1 STRING,
         string2 STRING,
         string3 STRING
      )
       ROW FORMAT 
             DELIMITED FIELDS TERMINATED BY '<>'
            LINES TERMINATED BY '\n'
       STORED AS TEXTFILE
        location '/user/hive/test_muldelim'
      

      Create a text file under /user/hive/test_muldelim with following 2 lines:

      data1<>data2<>data3
      aa<>bb<>cc
      

      Now notice that two-character delimiter wasn't parsed properly:

      jdbc:hive2://host.domain.com:1> select * from ruslan_test.test_muldelim ;
      +------------------------+------------------------+------------------------+--+
      | test_muldelim.string1  | test_muldelim.string2  | test_muldelim.string3  |
      +------------------------+------------------------+------------------------+--+
      | data1                  | >data2                 | >data3                 |
      | aa                     | >bb                    | >cc                    |
      +------------------------+------------------------+------------------------+--+
      2 rows selected (0.453 seconds)
      

      The second delimiter's character ('>') became part of the columns to the right (`string2` and `string3`).

      Table DDL:

      0: jdbc:hive2://host.domain.com:1> show create table dafault.test_muldelim ;
      +-----------------------------------------------------------------+--+
      |                         createtab_stmt                          |
      +-----------------------------------------------------------------+--+
      | CREATE EXTERNAL TABLE `default.test_muldelim`(              |
      |   `string1` string,                                             |
      |   `string2` string,                                             |
      |   `string3` string)                                             |
      | ROW FORMAT DELIMITED                                            |
      |   FIELDS TERMINATED BY '<>'                                     |
      |   LINES TERMINATED BY '\n'                                      |
      | STORED AS INPUTFORMAT                                           |
      |   'org.apache.hadoop.mapred.TextInputFormat'                    |
      | OUTPUTFORMAT                                                    |
      |   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'  |
      | LOCATION                                                        |
      |   'hdfs://epsdatalake/user/hive/test_muldelim'              |
      | TBLPROPERTIES (                                                 |
      |   'transient_lastDdlTime'='1476727100')                         |
      +-----------------------------------------------------------------+--+
      15 rows selected (0.286 seconds)
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Tagar Ruslan Dautkhanov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: