Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-18771

Incorrect StoreFileRefresh leading to split and compaction failures



    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.3.1
    • 1.4.0, 1.3.2, 2.0.0
    • None
    • None
    • Reviewed


      We ran into issues of compaction and split failures with 1.3 similar to HBASE-18186 and HBASE-17406. Here's what i believe is happening -

      Lets say we have 4 store files that are compacted to form a new one. At this point we now have 5 store files, however only 1(the newly formed) is open now for the store and rest are waiting to get archived by HFileArchiver
      Now before the files are archived we get a FNFE in a scanner. This results in HRegion.RegionScannerImpl.handleFileNotFound(FileNotFoundException fnfe) being called which results in region.refreshStoreFiles(true) -> HStore.refreshStoreFiles()
      HStore.refreshStoreFiles now checks the hdfs dir and adds the previously compacted files back to the store, however these files are also present in StoreFileManager's compactedFiles list. Now at this point HFileArchiver runs, checks compactedFiles list and moves these files into the archive directory.
      Now when compaction runs it gets:

      2017-09-04 12:30:13,899 ERROR [ctions-1504505399609] regionserver.CompactSplitThread - Compaction selection failed regionName = xxxx, storeName = 0, priority = 26, time = 1504528213899
      java.io.FileNotFoundException: File does not exist: hdfs://xxxx
      at org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1337)
      at org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1329)
      at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
      at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1329)
      at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
      at org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
      at org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
      at org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
      at org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:325)
      at org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
      at org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:65)
      at org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
      at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
      at org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1679)

      Similarly if a split happens after archival we fail after PONR while opening daughter regions due to FNFE. This results in parent offline and daughters also in a limbo since they're unable to open. Since we get the error after PONR we also end up aborting the RS.


        1. HBASE-18771.master.003.patch
          11 kB
          Abhishek Singh Chouhan
        2. HBASE-18771.master.002.patch
          10 kB
          Abhishek Singh Chouhan
        3. HBASE-18771.master.001.patch
          10 kB
          Abhishek Singh Chouhan
        4. HBASE-18771.branch-1.3.005.patch
          11 kB
          Abhishek Singh Chouhan
        5. HBASE-18771.branch-1.3.004.patch
          11 kB
          Abhishek Singh Chouhan
        6. HBASE-18771.branch-1.3.003.patch
          10 kB
          Abhishek Singh Chouhan
        7. HBASE-18771.branch-1.3.002.patch
          9 kB
          Abhishek Singh Chouhan
        8. HBASE-18771.branch-1.3.001.patch
          8 kB
          Abhishek Singh Chouhan

        Issue Links



              abhishek.chouhan Abhishek Singh Chouhan
              abhishek.chouhan Abhishek Singh Chouhan
              0 Vote for this issue
              11 Start watching this issue