Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-15287

mapreduce.RowCounter returns incorrect result with binary row key inputs

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.1
    • Fix Version/s: 1.3.0, 2.0.0
    • Component/s: mapreduce, util
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      org.apache.hadoop.hbase.mapreduce.RowCounter takes optional start/end key as inputs (-range option). It would work only when the string representation of value is identical to the string. When row key is binary, the string representation of the value would look like this: "\x00\x01", which would be incorrect interpreted as 8 char string in the current implementation:

      https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java

      To fix that, we need change how the value is converted from command line inputs:

      Change
      scan.setStartRow(Bytes.toBytes(startKey));
      to
      scan.setStartRow(Bytes.toBytesBinary(startKey));

      Do the same conversion to end key as well.

      The issue was discovered when the utility was used to calcualte row distribution on regions from table with binary row keys. The hbase:meta contains the start key of each region in format of above example.

        Attachments

        1. hbase-15287-v2.patch
          16 kB
          Matt Warhaftig
        2. hbase-15287-v1.patch
          9 kB
          Matt Warhaftig
        3. hbase-15287-branch-1-v1.patch
          15 kB
          Matt Warhaftig
        4. 15287-v2.patch
          16 kB
          Ted Yu

          Activity

            People

            • Assignee:
              mwarhaftig Matt Warhaftig
              Reporter:
              ruweih Randy Hu
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified