Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
Impala 1.0.1, Impala 2.3.0
-
None
Description
SUMMARY
If a default value is provided for a union-type Avro field (i.e. a union of "null" and some other type, since other unions are not supported by Impala), the default value must match the first type in the union. Otherwise Impala will return the following error when trying to query the table:
Failed to parse table schema: Invalid JSON integer in json_t_to_avro_value_helper
For example, the following field definition will produce this error:
{"name": "i", "type": ["int", "null"], "default": null}
This is technically not a bug since this is what the Avro spec dictates. However, it isn't very user-friendly.
WORKAROUND
Switch the order of the types in the union before writing the files. If you have existing files written with a problematic schema, you may need to rewrite those files with the fixed schema because Avro embeds the schema in the file.
For example, the following field definition can be queried successfully:
{"name": "i", "type": ["null", "int"], "default": null}
Original description
I have an Avro backed table. HIVE and the avro tools jar can read the files and IMPALA can describe the table. However selecting from the table in IMPALA causes the several deamons to crash?
I1021 11:01:18.022570 8623 status.cc:44] Failed to parse file schema: Invalid JSON float in json_t_to_avro_value_helper
@ 0x83af7d (unknown)
@ 0x922a00 (unknown)
@ 0x92309b (unknown)
@ 0x95e44d (unknown)
@ 0x910a8f (unknown)
@ 0x90a680 (unknown)
@ 0x9a36c4 (unknown)
@ 0x3681c07851 (unknown)
@ 0x36818e811d (unknown)
I1021 11:01:18.030833 5229 progress-updater.cc:56] Query 9c4f2e4eebf1c7a9:811b8dc272d75e8a: 6% Complete (1951 out of 29457)
My schema is
{
"type" : "record",
"name" : "points",
"fields" : [
,
{ "name" : "c2", "type" : [ "string", "null" ], "default" : null },
{ "name" : "c3", "type" : [ "string", "null" ], "default" : null },
{ "name" : "c4", "type" : [ "string", "null" ], "default" : null },
{ "name" : "c5", "type" : [ "double", "null" ], "default" : null },
{ "name" : "c6", "type" : [ "double", "null" ], "default" : null },
{ "name" : "c7", "type" : [ "string", "null" ], "default" : null },
{ "name" : "c8", "type" : [ "string", "null" ], "default" : null },
{ "name" : "c9", "type" : [ "double", "null" ], "default" : null },
{ "name" : "c10", "type" : [ "double", "null" ], "default" : null },
{ "name" : "c11", "type" : [ "double", "null" ], "default" : null },
{ "name" : "c12", "type" : [ "double", "null" ], "default" : null },
{ "name" : "c13", "type" : [ "double", "null" ], "default" : null },
{ "name" : "c14", "type" : [ "double", "null" ], "default" : null },
{ "name" : "c15", "type" : [ "double", "null" ], "default" : null },
{ "name" : "c16", "type" : [ "double", "null" ], "default" : null },
{ "name" : "c17", "type" : [ "double", "null" ], "default" : null },
{ "name" : "c18", "type" : [ "double", "null" ], "default" : null },
{ "name" : "id1", "type" : "int" },
{ "name" : "id2", "type" : "int" },
{ "name" : "root_id", "type" : "string" } ]
}
Describing table in impala works, the table is partition by columns not in the avro files (flume creates the directories).
Query: describe points
Query finished, fetching results ...
---------------------------------------------------
name | type | comment |
---------------------------------------------------
c1 | double | from deserializer |
c2 | string | from deserializer |
c3 | string | from deserializer |
c4 | string | from deserializer |
c5 | double | from deserializer |
c6 | double | from deserializer |
c7 | string | from deserializer |
c8 | string | from deserializer |
c9 | double | from deserializer |
c10 | double | from deserializer |
c11 | double | from deserializer |
c12 | double | from deserializer |
c13 | double | from deserializer |
c14 | double | from deserializer |
c15 | double | from deserializer |
c16 | double | from deserializer |
c17 | double | from deserializer |
c18 | double | from deserializer |
id1 | int | from deserializer |
id2 | int | from deserializer |
root_id | string | from deserializer |
deployment | string | |
date_id | int | |
hour | int | |
q_strategy | string | |
q_fund | string | |
q_expiry | string |
---------------------------------------------------
Returned 27 row(s) in 29.33s