Hive
  1. Hive
  2. HIVE-5871

Use multiple-characters as field delimiter

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.0
    • Fix Version/s: 0.14.0
    • Component/s: Contrib
    • Labels:

      Description

      By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs.
      The patch adds a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. For example:

      create table test (id string,hivearray array<binary>,hivemap map<string,int>) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="[,]","collection.delim"=":","mapkey.delim"="@");
      

      where field.delim is the field delimiter, collection.delim and mapkey.delim is the delimiter for collection items and key value pairs, respectively. Among these delimiters, field.delim is mandatory and can be of multiple characters, while collection.delim and mapkey.delim is optional and only support single character.

      To use MultiDelimitSerDe, you have to add the hive-contrib jar to the class path, e.g. with the add jar command.

      1. HIVE-5871.2.patch
        16 kB
        Rui Li
      2. HIVE-5871.3.patch
        16 kB
        Rui Li
      3. HIVE-5871.4.patch
        16 kB
        Rui Li
      4. HIVE-5871.5.patch
        17 kB
        Rui Li
      5. HIVE-5871.6.patch
        17 kB
        Rui Li
      6. HIVE-5871.patch
        16 kB
        Rui Li

        Issue Links

          Activity

          Rui Li created issue -
          Rui Li made changes -
          Field Original Value New Value
          Attachment HIVE-5871.patch [ 12615326 ]
          Rui Li made changes -
          Attachment HIVE-5871-v2.patch [ 12644378 ]
          Rui Li made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Rui Li made changes -
          Description Add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables. By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs.
          In the patch, I add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations.
          Brock Noland made changes -
          Assignee Rui Li [ lirui ]
          Rui Li made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Rui Li made changes -
          Attachment HIVE-5871-v2.patch [ 12644378 ]
          Rui Li made changes -
          Attachment HIVE-5871.2.patch [ 12660057 ]
          Rui Li made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Rui Li made changes -
          Attachment HIVE-5871.3.patch [ 12660075 ]
          Rui Li made changes -
          Remote Link This issue links to "RB request (Web Link)" [ 16695 ]
          Rui Li made changes -
          Attachment HIVE-5871.4.patch [ 12660319 ]
          Rui Li made changes -
          Attachment HIVE-5871.5.patch [ 12661385 ]
          Rui Li made changes -
          Attachment HIVE-5871.6.patch [ 12661600 ]
          Brock Noland made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 0.14.0 [ 12326450 ]
          Resolution Fixed [ 1 ]
          Lefty Leverenz made changes -
          Labels TODOC14
          Rui Li made changes -
          Description By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs.
          In the patch, I add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations.
          By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs.
          The patch adds a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. For example:

          Rui Li made changes -
          Description By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs.
          The patch adds a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. For example:

          By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs.
          The patch adds a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. For example:
          {code}
          create table test (id string,hivearray array<binary>,hivemap map<string,int>) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="[,]","collection.delim"=":","mapkey.delim"="@");
          {code}
          where {{field.delim}} is the field delimiter, {{collection.delim}} and {{mapkey.delim}} is the delimiter for collection items and key value pairs, respectively. Among these delimiters, {{field.delim}} is mandatory and can be of multiple characters, while {{collection.delim}} and {{mapkey.delim}} is optional and only support single character.

          To use MultiDelimitSerDe, you have to add the hive-contrib jar to the class path, e.g. with the {{add jar}} command.
          Thejas M Nair made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Damien Carol made changes -
          Link This issue is related to HIVE-9172 [ HIVE-9172 ]

            People

            • Assignee:
              Rui Li
              Reporter:
              Rui Li
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development