Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
Description
Gandiva's caching mechanism for filters relies on FilterCacheKey to return the correct cached filter. FilterCacheKey's hash function factors in the string representation of a given expression, however Expression::ToString() doesn't really distinguish null and string literal 'null'. As a result, incorrect filters may be returned from the cache.
In our case, we are building a SQL parser on top of gandiva, but, for instance, both of
WHERE null = null
and
WHERE 'null' = 'null'
result in the same string representation of gandiva expression:
bool equal((const string) null, (const string) null)
A simple test to demonstrate the issue:
auto f = field("foo", utf8()); auto schema = arrow::schema({f}); auto node_a = TreeExprBuilder::MakeStringLiteral("null"); auto node_b = TreeExprBuilder::MakeStringLiteral("null"); auto equal_func = TreeExprBuilder::MakeFunction("equal", {node_a, node_b}, arrow::boolean()); auto condition = TreeExprBuilder::MakeCondition(equal_func); std::shared_ptr<Filter> filter1; auto status = Filter::Make(schema, condition, &filter1); EXPECT_TRUE(status.ok()); auto string_type = std::make_shared<arrow::StringType>(); node_a = TreeExprBuilder::MakeNull(string_type); node_b = TreeExprBuilder::MakeNull(string_type); equal_func = TreeExprBuilder::MakeFunction("equal", {node_a, node_b}, arrow::boolean()); condition = TreeExprBuilder::MakeCondition(equal_func); std::shared_ptr<Filter> filter2; status = Filter::Make(schema, condition, &filter2); EXPECT_TRUE(status.ok()); EXPECT_TRUE(filter1.get() != filter2.get()); // fails here
Making LiteralToStream add quotes around the literal seems like a quick-and-dirty fix.