Description
By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs.
The patch adds a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. For example:
create table test (id string,hivearray array<binary>,hivemap map<string,int>) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="[,]","collection.delim"=":","mapkey.delim"="@");
where field.delim is the field delimiter, collection.delim and mapkey.delim is the delimiter for collection items and key value pairs, respectively. Among these delimiters, field.delim is mandatory and can be of multiple characters, while collection.delim and mapkey.delim is optional and only support single character.
To use MultiDelimitSerDe, you have to add the hive-contrib jar to the class path, e.g. with the add jar command.
Attachments
Attachments
Issue Links
- is duplicated by
-
HIVE-14989 FIELDS TERMINATED BY parsing broken when delimiter is more than 1 byte
-
- Resolved
-
- is related to
-
HIVE-14989 FIELDS TERMINATED BY parsing broken when delimiter is more than 1 byte
-
- Resolved
-
-
HIVE-9172 Merging HIVE-5871 into LazySimpleSerDe
-
- Patch Available
-
- supercedes
-
HIVE-1762 Multi-character delimiter strings do not work correctly
-
- Resolved
-
- links to