HBase
  1. HBase
  2. HBASE-1076

CME in HStore.notifyChangedReadersObservers

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Won't Fix
    • Affects Version/s: 0.19.1
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Running latest trunk plus jimk's patch for HBASE-543:

      2008-12-21 12:47:31,741 DEBUG org.apache.hadoop.hbase.regionserver.HStoreScanner
      : Added a StoreFileScanner to outstanding HStoreScanner
      2008-12-21 12:47:31,741 FATAL org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Replay of hlog required. Forcing server shutdown
      org.apache.hadoop.hbase.DroppedSnapshotException: region: urls,http|playvideogame.net|,1229725620550
      at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:880)
      at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:773)
      at org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:228)
      at org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushSomeRegions(MemcacheFlusher.java:292)
      at org.apache.hadoop.hbase.regionserver.MemcacheFlusher.reclaimMemcacheMemory(MemcacheFlusher.java:262)
      at org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1594)
      at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
      at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:894)
      Caused by: java.util.ConcurrentModificationException
      at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
      at java.util.HashMap$KeyIterator.next(HashMap.java:828)
      at org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:736)
      at org.apache.hadoop.hbase.regionserver.HStore.updateReaders(HStore.java:724)
      at org.apache.hadoop.hbase.regionserver.HStore.internalFlushCache(HStore.java:693)
      at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:629)
      at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:865)
      ... 10 more

        Issue Links

          Activity

          Hide
          stack added a comment -

          You were riding with HBASE-1059 in place Andrew?

          Show
          stack added a comment - You were riding with HBASE-1059 in place Andrew?
          Hide
          Andrew Purtell added a comment -

          Yes. Just checked the source now to confirm.

          Show
          Andrew Purtell added a comment - Yes. Just checked the source now to confirm.
          Hide
          stack added a comment -

          Not sure why we're getting this (Maybe someone else has an opinion).

          Attached patch tries new tactic – uses concurrent Set that does not throw a CME (loose view). Will try it on my own cluster here first before committing so others can try.

          Show
          stack added a comment - Not sure why we're getting this (Maybe someone else has an opinion). Attached patch tries new tactic – uses concurrent Set that does not throw a CME (loose view). Will try it on my own cluster here first before committing so others can try.
          Hide
          stack added a comment -

          Deleted patch; needs comparables.

          Show
          stack added a comment - Deleted patch; needs comparables.
          Hide
          Andrew Purtell added a comment -

          Latest trunk.

          2008-12-31 04:38:12,941 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region urls,http|languroo.com|,1230685581293. Current region memcache size 64.0m
          2008-12-31 04:38:12,987 DEBUG org.apache.hadoop.hbase.regionserver.HStoreScanner: Added a StoreFileScanner to outstanding HStoreScanner
          2008-12-31 04:38:12,987 FATAL org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Replay of hlog required. Forcing server shutdown
          org.apache.hadoop.hbase.DroppedSnapshotException: region: urls,http|languroo.com|,1230685581293
          at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:880)
          at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:773)
          at org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:227)
          at org.apache.hadoop.hbase.regionserver.MemcacheFlusher.run(MemcacheFlusher.java:137)
          Caused by: java.util.ConcurrentModificationException
          at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
          at java.util.HashMap$KeyIterator.next(HashMap.java:828)
          at org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:735)
          at org.apache.hadoop.hbase.regionserver.HStore.updateReaders(HStore.java:723)
          at org.apache.hadoop.hbase.regionserver.HStore.internalFlushCache(HStore.java:692)
          at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:628)
          at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:865)
          ... 3 more

          Show
          Andrew Purtell added a comment - Latest trunk. 2008-12-31 04:38:12,941 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region urls,http|languroo.com|,1230685581293. Current region memcache size 64.0m 2008-12-31 04:38:12,987 DEBUG org.apache.hadoop.hbase.regionserver.HStoreScanner: Added a StoreFileScanner to outstanding HStoreScanner 2008-12-31 04:38:12,987 FATAL org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Replay of hlog required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: urls,http|languroo.com|,1230685581293 at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:880) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:773) at org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:227) at org.apache.hadoop.hbase.regionserver.MemcacheFlusher.run(MemcacheFlusher.java:137) Caused by: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) at java.util.HashMap$KeyIterator.next(HashMap.java:828) at org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:735) at org.apache.hadoop.hbase.regionserver.HStore.updateReaders(HStore.java:723) at org.apache.hadoop.hbase.regionserver.HStore.internalFlushCache(HStore.java:692) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:628) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:865) ... 3 more
          Hide
          stack added a comment -
          Show
          stack added a comment - Removed synchronzation. Put in place instead http://java.sun.com/javase/6/docs/api/java/util/concurrent/CopyOnWriteArraySet.html .
          Hide
          Andrew Purtell added a comment -

          +1 on the patch. Strategy of it is sound. Haven't seen the CME on my cluster after heavy testing for 12 hours. Will continue to watch for it. Suggest resolving this issue. We can reopen it if necessary.

          Show
          Andrew Purtell added a comment - +1 on the patch. Strategy of it is sound. Haven't seen the CME on my cluster after heavy testing for 12 hours. Will continue to watch for it. Suggest resolving this issue. We can reopen it if necessary.
          Hide
          stack added a comment -

          Thanks for testing and review Andrew. Resolving.

          Show
          stack added a comment - Thanks for testing and review Andrew. Resolving.
          Hide
          Andrew Purtell added a comment -

          This was not actually fixed in 0.19.0, because it has been seen again on 0.19.1:

          2009-04-28 22:06:47,712 FATAL org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Replay of
          hlog required. Forcing server shutdown
          org.apache.hadoop.hbase.DroppedSnapshotException: region: urls,067656272028042009-0000000000000046441-10.64.67.48,1240925392960
                 at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:896)
                 at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:789)
                 at 
          org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:227)
                 at org.apache.hadoop.hbase.regionserver.MemcacheFlusher.run(MemcacheFlusher.java:137)
          Caused by: java.lang.NullPointerException
                 at org.apache.hadoop.hbase.regionserver.HStoreScanner.updateReaders(HStoreScanner.java:330)
                 at org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:737)
                 at org.apache.hadoop.hbase.regionserver.HStore.updateReaders(HStore.java:725)
                 at org.apache.hadoop.hbase.regionserver.HStore.internalFlushCache(HStore.java:694)
                 at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:630)
                 at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:881)
                 ... 3 more
          
          Show
          Andrew Purtell added a comment - This was not actually fixed in 0.19.0, because it has been seen again on 0.19.1: 2009-04-28 22:06:47,712 FATAL org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Replay of hlog required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: urls,067656272028042009-0000000000000046441-10.64.67.48,1240925392960 at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:896) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:789) at org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:227) at org.apache.hadoop.hbase.regionserver.MemcacheFlusher.run(MemcacheFlusher.java:137) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HStoreScanner.updateReaders(HStoreScanner.java:330) at org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:737) at org.apache.hadoop.hbase.regionserver.HStore.updateReaders(HStore.java:725) at org.apache.hadoop.hbase.regionserver.HStore.internalFlushCache(HStore.java:694) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:630) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:881) ... 3 more
          Hide
          stack added a comment -

          This is not a CME. Its NPE.

          Here is code:

          328       MapFile.Reader [] readers = this.store.getReaders();
          329       if (this.scanners[HSFS_INDEX] == null && readers != null &&
          330           readers.length > 0) {
          

          Its not the this.store because that is never cleared.

          Its not the readers because stack variable and we just checked it not null.

          This this.scanners is never cleared.

          Can you see what it is?

          Show
          stack added a comment - This is not a CME. Its NPE. Here is code: 328 MapFile.Reader [] readers = this .store.getReaders(); 329 if ( this .scanners[HSFS_INDEX] == null && readers != null && 330 readers.length > 0) { Its not the this.store because that is never cleared. Its not the readers because stack variable and we just checked it not null. This this.scanners is never cleared. Can you see what it is?
          Hide
          stack added a comment -

          I'm trying to put up new RC2 for 0.19.4. Should include this if we can figure it.

          Show
          stack added a comment - I'm trying to put up new RC2 for 0.19.4. Should include this if we can figure it.
          Hide
          Andrew Purtell added a comment - - edited

          Stack: This is not a CME. Its NPE.

          Sorry, next time, coffee first.

          It looks to me like 'readers' became null between the test

          readers != null
          

          and

          readers.length > 0
          

          How reliable are line numbers in stack traces? Just below on line 334 is potentially suspicious this.previousNext.getRow()

          Show
          Andrew Purtell added a comment - - edited Stack: This is not a CME. Its NPE. Sorry, next time, coffee first. It looks to me like 'readers' became null between the test readers != null and readers.length > 0 How reliable are line numbers in stack traces? Just below on line 334 is potentially suspicious this.previousNext.getRow()
          Hide
          stack added a comment -

          np Andrew. On line numbers, I usually apply the 'thereabouts' but IME, they are usually correct (helps to have actual src deployed too – do you have access?). the this.previousNext shouldn't be null either by my reading.

          Show
          stack added a comment - np Andrew. On line numbers, I usually apply the 'thereabouts' but IME, they are usually correct (helps to have actual src deployed too – do you have access?). the this.previousNext shouldn't be null either by my reading.
          Hide
          Andrew Purtell added a comment -

          That team is using jars from the 0.19.1 release tarball.

          Show
          Andrew Purtell added a comment - That team is using jars from the 0.19.1 release tarball.
          Hide
          stack added a comment -

          Line numbers match up same in 0.19.1 (if I tagged it right).

          On 'readers' above changing between test and use, that is not possible? Its local variable so on the executing's thread stack.

          But what do I know? Shall I hold up 0.19.4RC2 till we figure this?

          Show
          stack added a comment - Line numbers match up same in 0.19.1 (if I tagged it right). On 'readers' above changing between test and use, that is not possible? Its local variable so on the executing's thread stack. But what do I know? Shall I hold up 0.19.4RC2 till we figure this?
          Hide
          Andrew Purtell added a comment -

          I went over this a few times. I don't see how to change the code such that the situation might be improved. Thanks for taking a look at it also. I think we are both of the same opinion.

          This strikes me as some kind of JVM bug. I don't know how I would go about reproducing it.

          Show
          Andrew Purtell added a comment - I went over this a few times. I don't see how to change the code such that the situation might be improved. Thanks for taking a look at it also. I think we are both of the same opinion. This strikes me as some kind of JVM bug. I don't know how I would go about reproducing it.
          Hide
          stack added a comment -

          Moving out until someone smarter figures whats up here.

          Show
          stack added a comment - Moving out until someone smarter figures whats up here.
          Hide
          stack added a comment -

          Resolving as no longer pertinent.

          Show
          stack added a comment - Resolving as no longer pertinent.

            People

            • Assignee:
              stack
              Reporter:
              Andrew Purtell
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development