HBase
  1. HBase
  2. HBASE-1052

Stopping a HRegionServer with unflushed cache causes data loss from org.apache.hadoop.hbase.DroppedSnapshotException

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.18.0, 0.18.1
    • Fix Version/s: 0.19.0
    • Component/s: regionserver
    • Labels:
      None

      Description

      1. Start a Hbase cluster
      2. Create a table t1: create 't1',

      {NAME => 'f1'}

      3. Put a cell in the table: put 't1', 'r1', 'f1:', 'value'
      4. Scan it, see it's fine
      5. Stop the HRegionSever hosting the t1 region: hbase/bin/hbase-daemon.sh stop regionserver.
      6. Watch the region being reassigned from the original HRegionServer
      7. Scan the t1 table again. It's empty now.

      If between step 4 and step 5 the cache is flushed (e.g. Hbase cluster restart) no data is loss. However it means that if you stop a region server with dirty cache you will loose some data.

      HRegionServer log after issuing hbase-daemon.sh stop regionserver:

      2008-12-09 06:37:46,873 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 60020: exiting
      2008-12-09 06:37:46,873 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 60020: exiting
      2008-12-09 06:37:46,873 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 60020: exiting
      2008-12-09 06:37:46,873 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 60020: exiting
      2008-12-09 06:37:46,874 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
      2008-12-09 06:37:46,874 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
      2008-12-09 06:37:46,874 INFO org.mortbay.util.ThreadedServer: Stopping Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60030]
      2008-12-09 06:37:46,886 INFO org.mortbay.http.SocketListener: Stopped SocketListener on 0.0.0.0:60030
      2008-12-09 06:37:46,948 INFO org.mortbay.util.Container: Stopped HttpContext[/static,/static]
      2008-12-09 06:37:47,007 INFO org.mortbay.util.Container: Stopped HttpContext[/logs,/logs]
      2008-12-09 06:37:47,007 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.servlet.WebApplicationHandler@60ded0f0
      2008-12-09 06:37:47,094 INFO org.mortbay.util.Container: Stopped WebApplicationContext[/,/]
      2008-12-09 06:37:47,094 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.Server@6490832e
      2008-12-09 06:37:47,094 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: closing region t1,,1228833363456
      2008-12-09 06:37:47,094 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Compactions and cache flushes disabled for region t1,,1228833363456
      2008-12-09 06:37:47,094 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Scanners disabled for region t1,,1228833363456
      2008-12-09 06:37:47,094 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: No more active scanners for region t1,,1228833363456
      2008-12-09 06:37:47,095 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region t1,,1228833363456
      2008-12-09 06:37:47,095 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: No more row locks outstanding on region t1,,1228833363456
      2008-12-09 06:37:47,095 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region t1,,1228833363456. Current region memcache size 18.0
      2008-12-09 06:37:47,095 INFO org.apache.hadoop.hbase.regionserver.Flusher: regionserver/0:0:0:0:0:0:0:0:60020.cacheFlusher exiting
      2008-12-09 06:37:47,096 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting.
      2008-12-09 06:37:47,096 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: regionserver/0:0:0:0:0:0:0:0:60020.compactor exiting
      2008-12-09 06:37:47,099 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: error closing region t1,,1228833363456
      org.apache.hadoop.hbase.DroppedSnapshotException: region: t1,,1228833363456
      at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1071)
      at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:619)
      at org.apache.hadoop.hbase.regionserver.HRegionServer.closeAllRegions(HRegionServer.java:951)
      at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:459)
      at java.lang.Thread.run(Thread.java:619)
      Caused by: java.io.IOException: Filesystem closed
      at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:196)
      at org.apache.hadoop.dfs.DFSClient.getFileInfo(DFSClient.java:564)
      at org.apache.hadoop.dfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:390)
      at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
      at org.apache.hadoop.hbase.regionserver.HStoreFile.<init>(HStoreFile.java:152)
      at org.apache.hadoop.hbase.regionserver.HStore.internalFlushCache(HStore.java:599)
      at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:577)
      at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1058)
      ... 4 more
      2008-12-09 06:37:47,100 DEBUG org.apache.hadoop.hbase.regionserver.HLog: closing log writer in hdfs://h1:54310/hbase/log_10.131.237.51_1228833326838_60020
      2008-12-09 06:37:47,101 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Close and delete failed
      java.io.IOException: Filesystem closed
      at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:196)
      at org.apache.hadoop.dfs.DFSClient.access$600(DFSClient.java:59)
      at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:2689)
      at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:2655)
      at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:59)
      at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:79)
      at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:962)
      at org.apache.hadoop.hbase.regionserver.HLog.close(HLog.java:349)
      at org.apache.hadoop.hbase.regionserver.HLog.closeAndDelete(HLog.java:333)
      at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:461)
      at java.lang.Thread.run(Thread.java:619)
      2008-12-09 06:37:47,102 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 10.131.237.51:60020
      2008-12-09 06:37:47,104 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 10.131.237.51:60020
      2008-12-09 06:37:47,882 INFO org.apache.hadoop.hbase.Leases: regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker closing leases
      2008-12-09 06:37:47,882 INFO org.apache.hadoop.hbase.Leases: regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker closed leases
      2008-12-09 06:37:54,919 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
      2008-12-09 06:37:54,920 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/0:0:0:0:0:0:0:0:60020 exiting
      2008-12-09 06:37:54,920 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete

        Activity

        Cosmin Lehene created issue -
        Jim Kellerman made changes -
        Field Original Value New Value
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 0.19.0 [ 12313364 ]
        Assignee Jim Kellerman [ jimk ]
        Fix Version/s 0.18.2 [ 12313512 ]
        Resolution Fixed [ 1 ]
        Jim Kellerman made changes -
        Status Resolved [ 5 ] Reopened [ 4 ]
        Resolution Fixed [ 1 ]
        Jim Kellerman made changes -
        Fix Version/s 0.18.2 [ 12313512 ]
        Jim Kellerman made changes -
        Resolution Fixed [ 1 ]
        Status Reopened [ 4 ] Resolved [ 5 ]
        Jim Kellerman made changes -
        Status Resolved [ 5 ] Reopened [ 4 ]
        Resolution Fixed [ 1 ]
        Jim Kellerman made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        stack made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Jean-Daniel Cryans made changes -
        Fix Version/s 0.18.2 [ 12313512 ]

          People

          • Assignee:
            Jim Kellerman
            Reporter:
            Cosmin Lehene
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development