HBase
  1. HBase
  2. HBASE-8055

Null check missing in StoreFile.Reader.getMaxTimestamp()

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.98.0, 0.94.6, 0.95.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We just ran into a scenario where we got the following NPE:

      13/03/08 11:52:13 INFO regionserver.Store: Successfully loaded store file file:/tmp/hfile-import-00Dxx0000001lmJ-09Cxx00000000Jm/COLFAM/file09Cxx00000000Jm into store COLFAM (new location: file:/tmp/localhbase/data/SFDC.ENTITY_HISTORY_ARCHIVE/aeacee43aaf1748c6e60b9cc12bcac3d/COLFAM/120d683414e44478984b50ddd79b6826)
      13/03/08 11:52:13 ERROR regionserver.HRegionServer: Failed openScanner
      java.lang.NullPointerException
          at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.getMaxTimestamp(StoreFile.java:1702)
          at org.apache.hadoop.hbase.regionserver.StoreFileScanner.requestSeek(StoreFileScanner.java:301)
          at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:127)
          at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:2070)
          at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:3383)
          at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1628)
          at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1620)
          at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1596)
          at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2342)
          at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at java.lang.reflect.Method.invoke(Method.java:597)
          at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
          at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)
      13/03/08 11:52:14 ERROR regionserver.HRegionServer: Failed openScanner
      

      It's not clear, yet, how we got into this situation (we are generating HFiles via HFileOutputFormat and bulk load those). It seems that can only happen when the HFile itself is corrupted.

      Looking at the code, though, I see this is the only place where we access StoreFile.reader.timeRangeTracker without a null check. So it appears we are expecting scenarios in which it can be null.

      A simple fix would be:

          public long getMaxTimestamp() {
            return timeRangeTracker == null ? Long.MAX_VALUE : timeRangeTracker.maximumTimestamp;
          }
      
      1. 8055-0.94.txt
        0.5 kB
        Lars Hofhansl

        Activity

        Hide
        Lars Hofhansl added a comment - - edited

        If anybody has any ideas about how we can encounter a null timeRangeTracker here, I would love to hear them.
        (By the way, this code has been around for a while - since HBASE-4465)

        Show
        Lars Hofhansl added a comment - - edited If anybody has any ideas about how we can encounter a null timeRangeTracker here, I would love to hear them. (By the way, this code has been around for a while - since HBASE-4465 )
        Hide
        Lars Hofhansl added a comment -

        Looks like HBASE-7581 is related.

        Show
        Lars Hofhansl added a comment - Looks like HBASE-7581 is related.
        Hide
        Ted Yu added a comment -

        HBASE-7581 was about TestAccessController.
        Could there be a typo above ?

        Show
        Ted Yu added a comment - HBASE-7581 was about TestAccessController. Could there be a typo above ?
        Hide
        Lars Hofhansl added a comment -

        I do not understand the fix in HBASE-7581. Are we saying after a bulk load we have to disable/enable the table?!

        Show
        Lars Hofhansl added a comment - I do not understand the fix in HBASE-7581 . Are we saying after a bulk load we have to disable/enable the table?!
        Hide
        Lars Hofhansl added a comment -

        Ted Yu Check the stack trace on HBASE-7581.

        Show
        Lars Hofhansl added a comment - Ted Yu Check the stack trace on HBASE-7581 .
        Hide
        Ted Yu added a comment -
            try {
              byte [] timerangeBytes = metadataMap.get(TIMERANGE_KEY);
              if (timerangeBytes != null) {
                this.reader.timeRangeTracker = new TimeRangeTracker();
                Writables.copyWritable(timerangeBytes, this.reader.timeRangeTracker);
              }
            } catch (IllegalArgumentException e) {
              LOG.error("Error reading timestamp range data from meta -- " +
                  "proceeding without", e);
              this.reader.timeRangeTracker = null;
        

        If you don't find "Error reading timestamp range data from meta – " in log file, I guess metadataMap.get(TIMERANGE_KEY) returned null.

        Show
        Ted Yu added a comment - try { byte [] timerangeBytes = metadataMap.get(TIMERANGE_KEY); if (timerangeBytes != null ) { this .reader.timeRangeTracker = new TimeRangeTracker(); Writables.copyWritable(timerangeBytes, this .reader.timeRangeTracker); } } catch (IllegalArgumentException e) { LOG.error( "Error reading timestamp range data from meta -- " + "proceeding without" , e); this .reader.timeRangeTracker = null ; If you don't find "Error reading timestamp range data from meta – " in log file, I guess metadataMap.get(TIMERANGE_KEY) returned null.
        Hide
        Lars Hofhansl added a comment -

        Yep. That was my conclusion as well.
        Still not sure how that happened. HBASE-7581 seems to show one such scenario

        Show
        Lars Hofhansl added a comment - Yep. That was my conclusion as well. Still not sure how that happened. HBASE-7581 seems to show one such scenario
        Hide
        Ted Yu added a comment -

        I think reader.timeRangeTracker shouldn't be null upon exiting open().
        reader.timeRangeTracker should be initialized in the above two cases.

        Show
        Ted Yu added a comment - I think reader.timeRangeTracker shouldn't be null upon exiting open(). reader.timeRangeTracker should be initialized in the above two cases.
        Hide
        Lars Hofhansl added a comment -

        I'd agree. I wonder why there are all the other null checks, though.

        Show
        Lars Hofhansl added a comment - I'd agree. I wonder why there are all the other null checks, though.
        Hide
        Lars Hofhansl added a comment -

        I spent some time looking through the code. I can't see where this goes wrong.
        Checked the following:

        • bulk load will open the reader in all code paths (if the open was missing the metadata would not have been loaded)
        • in all circumstances the StoreFile's metadata is written. I had initially suspected the adhoc splitting in bulk load, but that code copies the metadata from the original file.
        • record writer in HFileOutputFormat writes the metadata

        Not sure where else to look. In any case, we should either remove all the other null-checks for timeRangeTracker or add the same null-check to the only methods where this is not done.
        Even then, I'd be worried about how this came about.

        Show
        Lars Hofhansl added a comment - I spent some time looking through the code. I can't see where this goes wrong. Checked the following: bulk load will open the reader in all code paths (if the open was missing the metadata would not have been loaded) in all circumstances the StoreFile's metadata is written. I had initially suspected the adhoc splitting in bulk load, but that code copies the metadata from the original file. record writer in HFileOutputFormat writes the metadata Not sure where else to look. In any case, we should either remove all the other null-checks for timeRangeTracker or add the same null-check to the only methods where this is not done. Even then, I'd be worried about how this came about.
        Hide
        Lars Hofhansl added a comment -

        Oh and one more comment - just so it's all here: If a table disable/re-enable fixed the problem in HBASE-7581 that would point to a problem during bulk load.

        Show
        Lars Hofhansl added a comment - Oh and one more comment - just so it's all here: If a table disable/re-enable fixed the problem in HBASE-7581 that would point to a problem during bulk load.
        Hide
        Lars Hofhansl added a comment -

        Apparently on our side that happened when a generated HFile somehow bypassed our HFile validation.

        At the very least I would like to add the simple missing null-check here (as suggested in the description).
        I will do so, unless I hear objections.

        (In that case we might be able to undo the changes in HBASE-7581, but I won't do that as part of this jira)

        Show
        Lars Hofhansl added a comment - Apparently on our side that happened when a generated HFile somehow bypassed our HFile validation. At the very least I would like to add the simple missing null-check here (as suggested in the description). I will do so, unless I hear objections. (In that case we might be able to undo the changes in HBASE-7581 , but I won't do that as part of this jira)
        Hide
        stack added a comment -

        Lets see a patch. We have a null-check now (NPE – smile). Interested to see what you will do when you encounter null down here.

        Show
        stack added a comment - Lets see a patch. We have a null-check now (NPE – smile). Interested to see what you will do when you encounter null down here.
        Hide
        Lars Hofhansl added a comment -

        Here's a trivial 0.94 patch.
        In the absence of any information we'll assume the timerange of this StoreFile extends to the end of time.
        (This is similar to other null checks we're doing for example in Store.isMajorCompaction(...))

        Show
        Lars Hofhansl added a comment - Here's a trivial 0.94 patch. In the absence of any information we'll assume the timerange of this StoreFile extends to the end of time. (This is similar to other null checks we're doing for example in Store.isMajorCompaction(...))
        Hide
        Ted Yu added a comment -

        Patch looks good.

        Show
        Ted Yu added a comment - Patch looks good.
        Hide
        stack added a comment -

        +1

        Show
        stack added a comment - +1
        Hide
        Lars Hofhansl added a comment -

        Committed to 0.94, 0.95, and 0.98.

        Show
        Lars Hofhansl added a comment - Committed to 0.94, 0.95, and 0.98.
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #3948 (See https://builds.apache.org/job/HBase-TRUNK/3948/)
        HBASE-8055 Null check missing in StoreFile.Reader.getMaxTimestamp() (Revision 1455403)

        Result = FAILURE
        larsh :
        Files :

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #3948 (See https://builds.apache.org/job/HBase-TRUNK/3948/ ) HBASE-8055 Null check missing in StoreFile.Reader.getMaxTimestamp() (Revision 1455403) Result = FAILURE larsh : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #442 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/442/)
        HBASE-8055 Null check missing in StoreFile.Reader.getMaxTimestamp() (Revision 1455403)

        Result = FAILURE
        larsh :
        Files :

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #442 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/442/ ) HBASE-8055 Null check missing in StoreFile.Reader.getMaxTimestamp() (Revision 1455403) Result = FAILURE larsh : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94 #892 (See https://builds.apache.org/job/HBase-0.94/892/)
        HBASE-8055 Null check missing in StoreFile.Reader.getMaxTimestamp() (Revision 1455400)

        Result = SUCCESS
        larsh :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
        Show
        Hudson added a comment - Integrated in HBase-0.94 #892 (See https://builds.apache.org/job/HBase-0.94/892/ ) HBASE-8055 Null check missing in StoreFile.Reader.getMaxTimestamp() (Revision 1455400) Result = SUCCESS larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
        Hide
        Hudson added a comment -

        Integrated in hbase-0.95 #64 (See https://builds.apache.org/job/hbase-0.95/64/)
        HBASE-8055 Null check missing in StoreFile.Reader.getMaxTimestamp() (Revision 1455402)

        Result = FAILURE
        larsh :
        Files :

        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
        Show
        Hudson added a comment - Integrated in hbase-0.95 #64 (See https://builds.apache.org/job/hbase-0.95/64/ ) HBASE-8055 Null check missing in StoreFile.Reader.getMaxTimestamp() (Revision 1455402) Result = FAILURE larsh : Files : /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
        Hide
        Hudson added a comment -

        Integrated in hbase-0.95-on-hadoop2 #22 (See https://builds.apache.org/job/hbase-0.95-on-hadoop2/22/)
        HBASE-8055 Null check missing in StoreFile.Reader.getMaxTimestamp() (Revision 1455402)

        Result = FAILURE
        larsh :
        Files :

        • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
        Show
        Hudson added a comment - Integrated in hbase-0.95-on-hadoop2 #22 (See https://builds.apache.org/job/hbase-0.95-on-hadoop2/22/ ) HBASE-8055 Null check missing in StoreFile.Reader.getMaxTimestamp() (Revision 1455402) Result = FAILURE larsh : Files : /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
        Hide
        Lars Hofhansl added a comment -

        OK... I tracked down our problem here. In one of our tests written by the platform team we are creating HFile directly via HFileWriter (HFile.getWriterFactory(conf, new CacheConfig(conf))...create()).
        That does not write the StoreFile metadata, we should have used StoreFile.Writer (via StoreFile.WriterBuilder

        Now, with that in mind. Should I:

        • Remove the null check added here (and all the other null checks for timeRangeTracker)? After all without the NPE we would not have encountered the problem (stack, I think that is what you were getting at, right?)
        • Check the HFile during bulk load (LoadIncrementalHFiles.groupOrSplit would be a reasonable spot). Although it is not entirely clear whether there are other situation in which we have HFiles without metdata.

        Lastly, how can we discourage the direct use of HFileWriter? In trunk HFile and HFileWriteV* are tagged with @InterfaceAudience.Private, maybe that is good enough...?

        Show
        Lars Hofhansl added a comment - OK... I tracked down our problem here. In one of our tests written by the platform team we are creating HFile directly via HFileWriter ( HFile.getWriterFactory(conf, new CacheConfig(conf))...create() ). That does not write the StoreFile metadata, we should have used StoreFile.Writer ( via StoreFile.WriterBuilder Now, with that in mind. Should I: Remove the null check added here (and all the other null checks for timeRangeTracker)? After all without the NPE we would not have encountered the problem ( stack , I think that is what you were getting at, right?) Check the HFile during bulk load (LoadIncrementalHFiles.groupOrSplit would be a reasonable spot). Although it is not entirely clear whether there are other situation in which we have HFiles without metdata. Lastly, how can we discourage the direct use of HFileWriter? In trunk HFile and HFileWriteV* are tagged with @InterfaceAudience.Private , maybe that is good enough...?
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94-security #121 (See https://builds.apache.org/job/HBase-0.94-security/121/)
        HBASE-8055 Null check missing in StoreFile.Reader.getMaxTimestamp() (Revision 1455400)

        Result = FAILURE
        larsh :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
        Show
        Hudson added a comment - Integrated in HBase-0.94-security #121 (See https://builds.apache.org/job/HBase-0.94-security/121/ ) HBASE-8055 Null check missing in StoreFile.Reader.getMaxTimestamp() (Revision 1455400) Result = FAILURE larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
        Hide
        stack added a comment -

        It is excellent you figured it Lars Hofhansl An HFile w/o metadata is legit, no? Yes, it would be good to shutdown HFileWriter... getting it directly. Can you make it protected or package private w/ javadoc pointing at Store.Writer?

        Show
        stack added a comment - It is excellent you figured it Lars Hofhansl An HFile w/o metadata is legit, no? Yes, it would be good to shutdown HFileWriter... getting it directly. Can you make it protected or package private w/ javadoc pointing at Store.Writer?
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94-security-on-Hadoop-23 #13 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/13/)
        HBASE-8055 Null check missing in StoreFile.Reader.getMaxTimestamp() (Revision 1455400)

        Result = FAILURE
        larsh :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
        Show
        Hudson added a comment - Integrated in HBase-0.94-security-on-Hadoop-23 #13 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/13/ ) HBASE-8055 Null check missing in StoreFile.Reader.getMaxTimestamp() (Revision 1455400) Result = FAILURE larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java

          People

          • Assignee:
            Lars Hofhansl
            Reporter:
            Lars Hofhansl
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development