Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.2.2
-
None
-
None
-
None
-
Hive version: 1.1.0-cdh5.5.1 (bundled with cloudera 5.5.1)
Description
When \n characters are contained in Avro files that are used as data bases for an external table, the result of SELECT queries may be corrupt. I encountered this error when querying hive both from beeline and from JDBC.
Steps to reproduce (used files are attached to ticket)
- Create an .avro file that contains newline characters in a value of a map:
avro-tools fromjson --schema-file test.schema test.json > test.avro
- Copy .avro file to HDFS
hdfs dfs -copyFromLocal test.avro /some/location/
- Create an external table in beeline containing this .avro:
beeline> CREATE EXTERNAL TABLE broken_newline_map ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/some/location/' TBLPROPERTIES ('avro.schema.literal'=' { "type" : "record", "name" : "myEntry", "namespace" : "myNamespace", "fields" : [ { "name" : "foo", "type" : "long" }, { "name" : "bar", "type" : { "type" : "map", "values" : "string" } } ] } ');
- Now, selecting may return corrupt results:
jdbc:hive2://my-server:10000/> select * from broken_newline_map; +-------------------------+---------------------------------------------------+--+ | broken_newline_map.foo | broken_newline_map.bar | +-------------------------+---------------------------------------------------+--+ | 1 | {"key2":"value2","key1":"value1\nafter newline"} | | 2 | {"key2":"new value2","key1":"new value"} | +-------------------------+---------------------------------------------------+--+ 2 rows selected (1.661 seconds) jdbc:hive2://my-server:10000/> select foo, map_keys(bar), map_values(bar) from broken_newline_map; +-------+------------------+-----------------------------+--+ | foo | _c1 | _c2 | +-------+------------------+-----------------------------+--+ | 1 | ["key2","key1"] | ["value2","value1"] | | NULL | NULL | NULL | | 2 | ["key2","key1"] | ["new value2","new value"] | +-------+------------------+-----------------------------+--+ 3 rows selected (28.05 seconds)
Obviously, the last result set contains corrupt entries (line 2) and incorrect entries (line 1). I also encountered this when doing this query with JDBC.
Attachments
Attachments
Issue Links
- is related to
-
HIVE-11785 Support escaping carriage return and new line for LazySimpleSerDe
- Closed