Hive
  1. Hive
  2. HIVE-2903

Numeric binary type keys are not compared properly

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: HBase Handler
    • Labels:
      None

      Description

      In current binary format for numbers, minus values are always greater than plus values, for example.

      System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255
      

        Issue Links

          Activity

          Hide
          Nick Dimiduk added a comment -

          Hi Navis. Mind having a look at the Ordered* types implemented in HBASE-8693, and the associated encoding strategies in HBASE-8201? You should find some useful implementations there.

          Show
          Nick Dimiduk added a comment - Hi Navis. Mind having a look at the Ordered* types implemented in HBASE-8693 , and the associated encoding strategies in HBASE-8201 ? You should find some useful implementations there.
          Hide
          Navis added a comment -

          It would be better to be handled in hbase rather then hive, IMHO. Before that, I've separated the code by option 'lbinary' ('signedbinary' conflicts with 'string' which starts with 's').

          Show
          Navis added a comment - It would be better to be handled in hbase rather then hive, IMHO. Before that, I've separated the code by option 'lbinary' ('signedbinary' conflicts with 'string' which starts with 's').
          Hide
          Phabricator added a comment -

          navis updated the revision "HIVE-2903 [jira] Numeric binary type keys are not compared properly".
          Reviewers: JIRA

          1. Separated storage option 'lbinary' for handling minus values.
          2. Added COLUMNS, COLUMN_TYPES to HBase table job property

          REVISION DETAIL
          https://reviews.facebook.net/D2481

          AFFECTED FILES
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java
          hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java
          hbase-handler/src/test/queries/external_table_ppd.q
          hbase-handler/src/test/results/external_table_ppd.q.out

          Show
          Phabricator added a comment - navis updated the revision " HIVE-2903 [jira] Numeric binary type keys are not compared properly". Reviewers: JIRA 1. Separated storage option 'lbinary' for handling minus values. 2. Added COLUMNS, COLUMN_TYPES to HBase table job property REVISION DETAIL https://reviews.facebook.net/D2481 AFFECTED FILES hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java hbase-handler/src/test/queries/external_table_ppd.q hbase-handler/src/test/results/external_table_ppd.q.out
          Hide
          Enis Soztutar added a comment -

          Well, it is not a "bug" of hbase. HBase only provides int -> byte[] conversion as a convenience, and it seems that Bytes.toBytes(int) and others only guarantees lexicographic ordering for unsigned numbers. We can definitely add something like Bytes.toSignedBytes() in HBase so that you can ensure signed numbers are sorted correctly in lexicographic order.

          Coming to Hive, I think Ashutosh is right, that we have to keep supporting already existing data in hbase serialized through Bytes.toBytes(). So, I would suggest we add another storage type (hbase.table.default.storage.type), like "signedbinary", which should do the hive-specific signed byte conversion.

          So, we would have:

          • cf:col#string : serialize as string
          • cf:col#binary : serialize as binary, compatible with Bytes.toBytes()
          • cf:col#signedBinary : serialize as signed binary.

          I would also suggest that, people might be interested in custom ser/de from Hive types to byte[], but I am not sure how feasible that would be to implement.

          Show
          Enis Soztutar added a comment - Well, it is not a "bug" of hbase. HBase only provides int -> byte[] conversion as a convenience, and it seems that Bytes.toBytes(int) and others only guarantees lexicographic ordering for unsigned numbers. We can definitely add something like Bytes.toSignedBytes() in HBase so that you can ensure signed numbers are sorted correctly in lexicographic order. Coming to Hive, I think Ashutosh is right, that we have to keep supporting already existing data in hbase serialized through Bytes.toBytes(). So, I would suggest we add another storage type (hbase.table.default.storage.type), like "signedbinary", which should do the hive-specific signed byte conversion. So, we would have: cf:col#string : serialize as string cf:col#binary : serialize as binary, compatible with Bytes.toBytes() cf:col#signedBinary : serialize as signed binary. I would also suggest that, people might be interested in custom ser/de from Hive types to byte[], but I am not sure how feasible that would be to implement.
          Hide
          Ashutosh Chauhan added a comment -

          Yeah, I am having hard time believing that hbase lets you do this. I am not sure if the bug is present in some form in hbase. Your experiment does suggest its there in hbase. If it is, then it certainly makes sense to patch hbase, instead of special handling it ourselves.

          Show
          Ashutosh Chauhan added a comment - Yeah, I am having hard time believing that hbase lets you do this. I am not sure if the bug is present in some form in hbase. Your experiment does suggest its there in hbase. If it is, then it certainly makes sense to patch hbase, instead of special handling it ourselves.
          Hide
          Navis added a comment -

          @Ashutosh Chauhan,
          You are right. This is just a hive specific binary format and not a general solution. So other non-hive hbase client should know about that apriori for using it. (Should this be a patch for hbase?)
          But without this, rows with minus keys are popped-up especially after applying HIVE-2897. I don't know how this should be handled.

          Show
          Navis added a comment - @Ashutosh Chauhan, You are right. This is just a hive specific binary format and not a general solution. So other non-hive hbase client should know about that apriori for using it. (Should this be a patch for hbase?) But without this, rows with minus keys are popped-up especially after applying HIVE-2897 . I don't know how this should be handled.
          Hide
          Ashutosh Chauhan added a comment -

          @Navis,
          The way you have fixed it, it will work only if data is written from hive into hbase and then queries are run from hive client against hbase. What if data was written in hbase through hbase client and then queried from hive client, this bug will still be there, isn't it?
          This also makes me wonder that this problem is not limited to hive, but for hbase in general. If you are writing data through hbase client and then do range scans, you will have same bug. There must be some solution in hbase space for this.

          Show
          Ashutosh Chauhan added a comment - @Navis, The way you have fixed it, it will work only if data is written from hive into hbase and then queries are run from hive client against hbase. What if data was written in hbase through hbase client and then queried from hive client, this bug will still be there, isn't it? This also makes me wonder that this problem is not limited to hive, but for hbase in general. If you are writing data through hbase client and then do range scans, you will have same bug. There must be some solution in hbase space for this.
          Hide
          Navis added a comment -

          Passed all tests.

          Show
          Navis added a comment - Passed all tests.
          Hide
          Phabricator added a comment -

          navis requested code review of "HIVE-2903 [jira] Numeric binary type keys are not compared properly".
          Reviewers: JIRA

          DPAL-1007 Numeric binary type keys are not compared properly

          In current binary format for numbers, minus values are always greater than plus values, for example.

          System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255

          TEST PLAN
          EMPTY

          REVISION DETAIL
          https://reviews.facebook.net/D2481

          AFFECTED FILES
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java
          hbase-handler/src/test/queries/external_table_ppd.q
          hbase-handler/src/test/results/external_table_ppd.q.out

          MANAGE HERALD DIFFERENTIAL RULES
          https://reviews.facebook.net/herald/view/differential/

          WHY DID I GET THIS EMAIL?
          https://reviews.facebook.net/herald/transcript/5565/

          Tip: use the X-Herald-Rules header to filter Herald messages in your client.

          Show
          Phabricator added a comment - navis requested code review of " HIVE-2903 [jira] Numeric binary type keys are not compared properly". Reviewers: JIRA DPAL-1007 Numeric binary type keys are not compared properly In current binary format for numbers, minus values are always greater than plus values, for example. System.our.println(Bytes.compareTo(Bytes.toBytes(-100), Bytes.toBytes(100))); // 255 TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D2481 AFFECTED FILES hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java hbase-handler/src/test/queries/external_table_ppd.q hbase-handler/src/test/results/external_table_ppd.q.out MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/5565/ Tip: use the X-Herald-Rules header to filter Herald messages in your client.

            People

            • Assignee:
              Navis
              Reporter:
              Navis
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Development