This command fails with the following error:
hive/bin/hive -e "INSERT OVERWRITE LOCAL DIRECTORY 'abc' select null from zshao_tt"
FAILED: Error in semantic analysis:
java.lang.RuntimeException: Internal error: Cannot find ObjectInspector for VOID
When 'null' is replaced by '' it works.
NULL is not handled correctly in join
select null from null from zshao_tt ?? or is it select null from zshao_tt.
corrected. it's "select null from zshao_tt"
Thanks guys! I figured that out too. Debugging now.
The error is threw during semantic analysis where the parser try to instantiate an ObjectInspector for type "void" (corresponding to null).
The following similar query succeeded:
select 2 from zshao_tt;
since the semantic analyzer is able to deduct type (int) from the constant 2.
Since null can also be treated as a constant, I think we need to write an ObjectInspector for its type (void) too. And the object inspector will always return null for each row.
+1 on the idea. We have WritableVoidObjectInspector, JavaVoidObjectInspector etc but none are actually implemented or used.
Can we use JavaVoidObjectInspector for null constant? We probably need to modify ObjectInspectorConverters to allow conversion from void type to any type as well.
Thanks for the pointer Zheng. We certainly can use JavaVoidObjecInspector for NULL. However it turns out NULL is more complicated than what it sounds like (see http://en.wikipedia.org/wiki/Null_(SQL) ). NULL has to be handled not only in this simple select cause, but also in much more complex cases in the where-clause and joins. Let's have a talk.
We do handle NULL in the WHERE clause and "AND", "OR" functions using the 3-value booleans. I am not sure whether we ignore NULL in join/group-by but that seems easy enough to do.
Do you have a specific example that the current Hive code breaks?
I wasn't able to get an example in HIVE. I tried to insert NULL to a table by modifying the test query input4_cb_delim.q (actually by only modifying the input data to be
and run the query
select * from input4_cb where key is not null;
and both rows are returned. So either the NULL values are not inserted correctly, or the evaluation is not correct about NULL.
With LazySimpleSerDe (the default), we represent NULL by two bytes: "\" and "N". This is in sync with MySQL export format.
Can you try again with that?
That's great! It works (and my test queries on NULL also works too). In this case I guess it is much simpler since what we need here is just the VOID type ObjectInspector. I'll working on it.
BTW, I tried joins and it doesn't handle NULL semantics correctly. Here's the data and results:
select * from input4_cb a join input4_cb b on a.key = b.value;
NULL 325 18 NULL
The correct result should be empty set.
I guess the NULL vs. NULL comparison is incorrectly evaluated.
added the JavaVoidObjectInspector support for LazySimpleSerDe. Will add the same support to other serdes.
Good finding! Let's fix the join bug on a separate JIRA. Can you open a jira on that?
I am not sure about why we need to add JavaVoidObjectInspector support to LazySimpleSerDe - data from SerDe already have types (from columns). We only need to handle NULL that user specified in the query. Thoughts?
In the failing query, there is no column corresponding to null (select null from tt), so the semantic analyzer doesn't know the type of the NULL, therefore it is VOID type. That's why JavaVoidObjectInspector needs to be created. It is the similar that in (select 2 from tt), An JavaIntegerObjectInspector is needed for the constant 2.
Discussed with Ning offline. SemanticAnalyzer.genFileSinkPlan() will be the place for the change - in case the target is a local/remote directory (instead of a table), when we create table column names and types, we should use string type if the input is a void type. Later in the same function, we are using genConversionSelectOperator() to convert the column types.
So ODBC/JDBC will see this column as string type. Raghu confirmed that MySQL is creating a table with a column typed "BINARY(0)" when running "CREATE TABLE AS SELECT NULL, ..."
hive> select * from (select NULL from zshao_tt union all select 1 from zshao_tt)x;
select * from (select NULL from zshao_tt union all select 1 from zshao_tt)x;
FAILED: Error in semantic analysis: Schema of both sides of union should match: Column _c0 is of type void on first table and type int on second table
Do you want to handle this also in this patch, or file a new jira for above ?
Namit, thanks for the comments! I have started a new JIRA HIVE-747 for that. I'll look at this one soon.
looks good - will commit if the tests pass
Committed. Thanks Ning
Thank you very much!