[HBASE-26211] [hbase-connectors] Pushdown filters in Spark do not work correctly with long types - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0.0
Fix Version/s: hbase-connectors-1.0.1
Component/s: hbase-connectors
Labels:
None

Description

Reading from an HBase table and filtering on a LONG column does not seem to work correctly.

{{Dataset<Row> df = spark.read()
.format("org.apache.hadoop.hbase.spark")
.option("hbase.columns.mapping", "id STRING :key, v LONG cf:v")
...
.load();
df.filter("v > 100").show();}}

Expected behaviour is to show rows where cf:v > 100, but instead an empty dataset is shown.

Moreover, replacing "v > 100" with "v >= 100" results in a dataset where some rows have values of v less than 100.

The problem appears to be that long values are decoded incorrectly as integers in NaiveEncoder.filter:

{{case LongEnc | TimestampEnc =>
val in = Bytes.toInt(input, offset1)
val value = Bytes.toInt(filterBytes, offset2 + 1)
compare(in.compareTo(value), ops)}}

It looks like that error hasn’t been caught because DynamicLogicExpressionSuite lack test cases with long values.

The erroneous code is also present in the master branch. We have extended the test suite and implemented a quick fix and will PR on GitHub.

Attachments

Issue Links

links to

GitHub Pull Request #83

Activity

People

Assignee:: Hristo Iliev

Reporter:: Hristo Iliev

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 20/Aug/21 14:40

Updated:: 17/Oct/23 23:49

Resolved:: 23/Aug/21 08:32