Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11785

Support escaping carriage return and new line for LazySimpleSerDe

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.0.0
    • Query Processor
    • Incompatible change
    • Hide
      This change with HIVE-12820 in addition adds the support of carriage return and new line characters in the fields. Before this change, the user needs to preprocess the text by replacing them with some characters other than carriage return and new line in order for the files to be properly processed. With this change, it will automatically escape them if {{serialization.escape.crlf}} serde property is set to true. One incompatible change is: characters 'r' and 'n' cannot be used as separator or field delimiter.
      Show
      This change with HIVE-12820 in addition adds the support of carriage return and new line characters in the fields. Before this change, the user needs to preprocess the text by replacing them with some characters other than carriage return and new line in order for the files to be properly processed. With this change, it will automatically escape them if {{serialization.escape.crlf}} serde property is set to true. One incompatible change is: characters 'r' and 'n' cannot be used as separator or field delimiter.

    Description

      Create the table and perform the queries as follows. You will see different results when the setting changes.
      The expected result should be:

      1	newline
      here
      2	carriage return
      3	both
      here
      
      hive> create table repo (lvalue int, charstring string) stored as parquet;
      OK
      Time taken: 0.34 seconds
      hive> load data inpath '/tmp/repo/test.parquet' overwrite into table repo;
      Loading data to table default.repo
      chgrp: changing ownership of 'hdfs://nameservice1/user/hive/warehouse/repo/test.parquet': User does not belong to hive
      Table default.repo stats: [numFiles=1, numRows=0, totalSize=610, rawDataSize=0]
      OK
      Time taken: 0.732 seconds
      hive> set hive.fetch.task.conversion=more;
      hive> select * from repo;
      OK
      1	newline
      here
      here	carriage return
      3	both
      here
      Time taken: 0.253 seconds, Fetched: 3 row(s)
      hive> set hive.fetch.task.conversion=none;
      hive> select * from repo;
      Query ID = root_20150909113535_e081db8b-ccd9-4c44-aad9-d990ffb8edf3
      Total jobs = 1
      Launching Job 1 out of 1
      Number of reduce tasks is set to 0 since there's no reduce operator
      Starting Job = job_1441752031022_0006, Tracking URL = http://host-10-17-81-63.coe.cloudera.com:8088/proxy/application_1441752031022_0006/
      Kill Command = /opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/hadoop/bin/hadoop job  -kill job_1441752031022_0006
      Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
      2015-09-09 11:35:54,127 Stage-1 map = 0%,  reduce = 0%
      2015-09-09 11:36:04,664 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.98 sec
      MapReduce Total cumulative CPU time: 2 seconds 980 msec
      Ended Job = job_1441752031022_0006
      MapReduce Jobs Launched:
      Stage-Stage-1: Map: 1   Cumulative CPU: 2.98 sec   HDFS Read: 4251 HDFS Write: 51 SUCCESS
      Total MapReduce CPU Time Spent: 2 seconds 980 msec
      OK
      1	newline
      NULL	NULL
      2	carriage return
      NULL	NULL
      3	both
      NULL	NULL
      Time taken: 25.131 seconds, Fetched: 6 row(s)
      hive>
      

      Attachments

        1. test.parquet
          0.6 kB
          Aihua Xu
        2. HIVE-11785.patch
          19 kB
          Aihua Xu
        3. HIVE-11785.2.patch
          293 kB
          Aihua Xu
        4. HIVE-11785.3.patch
          294 kB
          Aihua Xu

        Issue Links

          Activity

            People

              aihuaxu Aihua Xu
              aihuaxu Aihua Xu
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: