Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26211

[hbase-connectors] Pushdown filters in Spark do not work correctly with long types

    XMLWordPrintableJSON

Details

    Description

      Reading from an HBase table and filtering on a LONG column does not seem to work correctly.

      {{Dataset<Row> df = spark.read()
        .format("org.apache.hadoop.hbase.spark")
        .option("hbase.columns.mapping", "id STRING :key, v LONG cf:v")
        ...
        .load();
      df.filter("v > 100").show();}}

      Expected behaviour is to show rows where cf:v > 100, but instead an empty dataset is shown.

      Moreover, replacing "v > 100" with "v >= 100" results in a dataset where some rows have values of v less than 100. 

      The problem appears to be that long values are decoded incorrectly as integers in NaiveEncoder.filter:

      {{case LongEnc | TimestampEnc =>
        val in = Bytes.toInt(input, offset1)
        val value = Bytes.toInt(filterBytes, offset2 + 1)
        compare(in.compareTo(value), ops)}}

      It looks like that error hasn’t been caught because DynamicLogicExpressionSuite lack test cases with long values.

      The erroneous code is also present in the master branch. We have extended the test suite and implemented a quick fix and will PR on GitHub.

      Attachments

        Issue Links

          Activity

            People

              hiliev Hristo Iliev
              hiliev Hristo Iliev
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: