Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-15235

Inconsistent cell reads over multiple bulk-loaded HFiles

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      If there are two bulkloaded hfiles in a region with same seqID and duplicate keys*, get and scan may return different values for a key.
      More details:

      • one of the rows had 200k+ columns. say row is 'r', column family is 'cf' and column qualifiers are 1 to 1000.
      • hfiles were split somewhere along that row, but there were a range of columns in both hfiles. For eg, something like - hfile1: ["", r:cf:70) and hfile2: [r:cf:40, ....).
      • Between columns 40 to 70, some (not all) columns were in both the files with different values. Whereas other were only in one of the files.

      In such a case, depending on file size (because we take it into account when sorting hfiles internally), we may get different values for the same cell (say "r", "cf:50") depending on what we call: get "r" "cf:50" or get "r" "cf:".

      I have been able to replicate this issue, will post the instructions shortly.

      • not sure how this would happen. These files are small ~50M, nor could i find any setting for max file size that could lead to splits. Need to investigate more.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                appy Apekshit Sharma
                Reporter:
                appy Apekshit Sharma
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: