Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-1628

NPE on deep copied dumped memory iterator

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.4.0, 1.5.0
    • Fix Version/s: 1.5.2, 1.6.1, 1.7.0
    • Component/s: None
    • Labels:
      None

      Description

      Accumulo may dump memory while a scan is running and transparently switch iterators from memory to file. If an iterator calls deepcopy after this happens, then seek will fail on the deep copy.

      java.lang.NullPointerException
      	at org.apache.accumulo.server.tabletserver.InMemoryMap$MemoryDataSource.iterator(InMemoryMap.java:349)
      	at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:163)
      
      1. ACCUMULO-1628-1.5.2-SNAPSHOT-1.patch
        7 kB
        Keith Turner
      2. ACCUMULO-1628-1.5.2-SNAPSHOT-2.patch
        10 kB
        Keith Turner
      3. ACCUMULO-1628-1.5.2-SNAPSHOT-3.patch
        15 kB
        Keith Turner
      4. ACCUMULO-1628-1.patch
        7 kB
        Keith Turner
      5. ACCUMULO-1628-SET_INTR.patch
        0.9 kB
        Keith Turner
      6. bpopp-ACCUMULO-1628.log
        9 kB
        Ben Popp

        Issue Links

          Activity

          Hide
          mdrob Mike Drob added a comment -

          Keith Turner - is this a 1.6.0 change, or something that gets pushed to 1.6.1 & 1.7.0?

          Show
          mdrob Mike Drob added a comment - Keith Turner - is this a 1.6.0 change, or something that gets pushed to 1.6.1 & 1.7.0?
          Hide
          benpopp Ben Popp added a comment -

          I saw a similar issue today and Chris McCubbin & John Vines pointed me at this issue. I was running concurrent reads & writes on a single node dev environment with accumulo 1.5.0. i was using the java IMM, not native.

          full log attached as bpopp-ACCUMULO-1628.log... excerpt below:

          2014-02-11 11:20:26,947 [tabletserver.TabletServer] WARN : exception while doing multi-scan 
          java.lang.NullPointerException
          	at org.apache.accumulo.server.tabletserver.InMemoryMap$MemoryDataSource.iterator(InMemoryMap.java:427)
          	at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:163)
          	at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:101)
          	at org.apache.accumulo.core.iterators.SkippingIterator.seek(SkippingIterator.java:37)
          
          Show
          benpopp Ben Popp added a comment - I saw a similar issue today and Chris McCubbin & John Vines pointed me at this issue. I was running concurrent reads & writes on a single node dev environment with accumulo 1.5.0. i was using the java IMM, not native. full log attached as bpopp- ACCUMULO-1628 .log... excerpt below: 2014-02-11 11:20:26,947 [tabletserver.TabletServer] WARN : exception while doing multi-scan java.lang.NullPointerException at org.apache.accumulo.server.tabletserver.InMemoryMap$MemoryDataSource.iterator(InMemoryMap.java:427) at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:163) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:101) at org.apache.accumulo.core.iterators.SkippingIterator.seek(SkippingIterator.java:37)
          Hide
          elserj Josh Elser added a comment -

          What's the plan for this – I'm planning to cut another RC tmrw for 1.5.1. Is this something that's going to tried to be squeezed into 1.5.1? Is 1.5.2 acceptable?

          Show
          elserj Josh Elser added a comment - What's the plan for this – I'm planning to cut another RC tmrw for 1.5.1. Is this something that's going to tried to be squeezed into 1.5.1? Is 1.5.2 acceptable?
          Hide
          ecn Eric Newton added a comment -

          It's not a blocker, I wouldn't wait on it.

          Show
          ecn Eric Newton added a comment - It's not a blocker, I wouldn't wait on it.
          Hide
          vines John Vines added a comment -

          FYI, this still occurs and it presents as itself as an NPE coming up through continueMultiScan.

          Maybe this can be handled by changing the switched flag ot a shared object so taht way all clones know when it flips?

          Show
          vines John Vines added a comment - FYI, this still occurs and it presents as itself as an NPE coming up through continueMultiScan. Maybe this can be handled by changing the switched flag ot a shared object so taht way all clones know when it flips?
          Hide
          vines John Vines added a comment -

          Actually, the common case for this coming up is SourceSwitchingIterator.seek(), which calls InMemorymap.iterator as long as it doesn't already have an iterator.

          It then goes on and calls readNext(true), and the first thing it does there is

          (!onlySwitchAfterRow && switchSource()) || initialSeek

          if onlySwitchAfterRow is false, then switchSource() is guaranteed to be called, which I believe will handle this issue. maybe if we force switchSource() to be called when initialSeek is true and work around the InMemoryMap.iterator call in seek() we can avoid this issue entirely. Or maybe just incorporate a switchSource() call into seek.

          Show
          vines John Vines added a comment - Actually, the common case for this coming up is SourceSwitchingIterator.seek(), which calls InMemorymap.iterator as long as it doesn't already have an iterator. It then goes on and calls readNext(true), and the first thing it does there is (!onlySwitchAfterRow && switchSource()) || initialSeek if onlySwitchAfterRow is false, then switchSource() is guaranteed to be called, which I believe will handle this issue. maybe if we force switchSource() to be called when initialSeek is true and work around the InMemoryMap.iterator call in seek() we can avoid this issue entirely. Or maybe just incorporate a switchSource() call into seek.
          Hide
          elserj Josh Elser added a comment -

          John Vines do you have an updated stack trace you can provide? It looks like the original one has shifted a little.

          Show
          elserj Josh Elser added a comment - John Vines do you have an updated stack trace you can provide? It looks like the original one has shifted a little.
          Hide
          vines John Vines added a comment -

          Bottom of it

          Caused by: java.lang.NullPointerException
          at org.apache.accumulo.tserver.InMemoryMap$MemoryDataSource.iterator(InMemoryMap.java:547)
          at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:163)
          at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:101)
          at org.apache.accumulo.core.iterators.SkippingIterator.seek(SkippingIterator.java:37)
          at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:101)
          at org.apache.accumulo.core.iterators.system.MultiIterator.seek(MultiIterator.java:105)
          at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:101)
          at org.apache.accumulo.core.iterators.system.StatsIterator.seek(StatsIterator.java:64)
          at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:101)
          at org.apache.accumulo.core.iterators.system.DeletingIterator.seek(DeletingIterator.java:67)
          at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:101)
          at org.apache.accumulo.core.iterators.SkippingIterator.seek(SkippingIterator.java:37)
          at org.apache.accumulo.core.iterators.system.ColumnFamilySkippingIterator.seek(ColumnFamilySkippingIterator.java:123)
          at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:101)
          at org.apache.accumulo.core.iterators.Filter.seek(Filter.java:64)
          at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:101)
          at org.apache.accumulo.core.iterators.Filter.seek(Filter.java:64)
          at org.apache.accumulo.core.iterators.system.SynchronizedIterator.seek(SynchronizedIterator.java:55)
          Show
          vines John Vines added a comment - Bottom of it Caused by: java.lang.NullPointerException at org.apache.accumulo.tserver.InMemoryMap$MemoryDataSource.iterator(InMemoryMap.java:547) at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:163) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:101) at org.apache.accumulo.core.iterators.SkippingIterator.seek(SkippingIterator.java:37) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:101) at org.apache.accumulo.core.iterators.system.MultiIterator.seek(MultiIterator.java:105) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:101) at org.apache.accumulo.core.iterators.system.StatsIterator.seek(StatsIterator.java:64) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:101) at org.apache.accumulo.core.iterators.system.DeletingIterator.seek(DeletingIterator.java:67) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:101) at org.apache.accumulo.core.iterators.SkippingIterator.seek(SkippingIterator.java:37) at org.apache.accumulo.core.iterators.system.ColumnFamilySkippingIterator.seek(ColumnFamilySkippingIterator.java:123) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:101) at org.apache.accumulo.core.iterators.Filter.seek(Filter.java:64) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:101) at org.apache.accumulo.core.iterators.Filter.seek(Filter.java:64) at org.apache.accumulo.core.iterators.system.SynchronizedIterator.seek(SynchronizedIterator.java:55)
          Hide
          kturner Keith Turner added a comment -

          John Vines would you be able to take this patch for a test drive?

          Show
          kturner Keith Turner added a comment - John Vines would you be able to take this patch for a test drive?
          Hide
          vines John Vines added a comment -

          I'm trying to put together a test bench to recreate it similar to the cases we saw in the wild

          Show
          vines John Vines added a comment - I'm trying to put together a test bench to recreate it similar to the cases we saw in the wild
          Hide
          kturner Keith Turner added a comment -

          uploaded final patch that was posted to RB

          Show
          kturner Keith Turner added a comment - uploaded final patch that was posted to RB
          Hide
          vines John Vines added a comment -
          java.lang.RuntimeException: Calling setInterruptFlag on a deep copy is not supported
          at org.apache.accumulo.core.file.rfile.RFile$Reader.setInterruptFlag(RFile.java:1007)
          at org.apache.accumulo.tserver.MemKeyConversionIterator.setInterruptFlag(InMemoryMap.java:182)
          at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.switchSource(SourceSwitchingIterator.java:148)
          at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.readNext(SourceSwitchingIterator.java:112)
          at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:168)
          at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:101)
          at or
          g.apache.accumulo.core.iterators.SkippingIterator.seek(SkippingIterator.java:37)
          

          Jonathan Park has been seeing this issue in a 1.6.0 with this patch backported. I have a strong feeling it's due to this patch

          Show
          vines John Vines added a comment - java.lang.RuntimeException: Calling setInterruptFlag on a deep copy is not supported at org.apache.accumulo.core.file.rfile.RFile$Reader.setInterruptFlag(RFile.java:1007) at org.apache.accumulo.tserver.MemKeyConversionIterator.setInterruptFlag(InMemoryMap.java:182) at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.switchSource(SourceSwitchingIterator.java:148) at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.readNext(SourceSwitchingIterator.java:112) at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:168) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:101) at or g.apache.accumulo.core.iterators.SkippingIterator.seek(SkippingIterator.java:37) Jonathan Park has been seeing this issue in a 1.6.0 with this patch backported. I have a strong feeling it's due to this patch
          Hide
          kturner Keith Turner added a comment -

          John Vines I'll take a look

          Show
          kturner Keith Turner added a comment - John Vines I'll take a look
          Hide
          kturner Keith Turner added a comment -

          John Vines I was able to reproduce the bug you are seeing in a unit test. I attached ACCUMULO-1628-SET_INTR.patch with the changes to the unit test. This definitely related to the changes I made. I am going to look into fixing it.

          Show
          kturner Keith Turner added a comment - John Vines I was able to reproduce the bug you are seeing in a unit test. I attached ACCUMULO-1628 -SET_INTR.patch with the changes to the unit test. This definitely related to the changes I made. I am going to look into fixing it.
          Hide
          kturner Keith Turner added a comment -

          Another attempt ACCUMULO-1628-1.5.2-SNAPSHOT-3.patch is relative to ACCUMULO-1628-1.5.2-SNAPSHOT-2.patch. I am going to merge this patch forward locally and try running the ITs in 1.6.

          John Vines if you have a chance to kick the tires on this patch, it would be appreciated.

          Show
          kturner Keith Turner added a comment - Another attempt ACCUMULO-1628 -1.5.2-SNAPSHOT-3.patch is relative to ACCUMULO-1628 -1.5.2-SNAPSHOT-2.patch. I am going to merge this patch forward locally and try running the ITs in 1.6. John Vines if you have a chance to kick the tires on this patch, it would be appreciated.
          Hide
          elserj Josh Elser added a comment -

          I took at look at the patch and ran some tests with it. I'm content with it.

          Keith Turner, did your ITs finish? Can we commit this and move forward?

          Show
          elserj Josh Elser added a comment - I took at look at the patch and ran some tests with it. I'm content with it. Keith Turner , did your ITs finish? Can we commit this and move forward?
          Hide
          elserj Josh Elser added a comment -

          Keith told me that he did run all ITs in 1.6 with the changes. I ran them myself as well.

          Show
          elserj Josh Elser added a comment - Keith told me that he did run all ITs in 1.6 with the changes. I ran them myself as well.
          Hide
          vines John Vines added a comment -

          I think this has recurred in 1.6.0 in a different with the patch you included.

          2014-11-14 22:08:12,745 [tserver.InMemoryMap] ERROR: Failed to create mem dump file
          java.io.EOFException
                  at java.io.DataInputStream.readByte(DataInputStream.java:267)
                  at org.apache.accumulo.core.file.rfile.RelativeKey.fastSkip(RelativeKey.java:314)
                  at org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader._seek(RFile.java:748)
                  at org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader.seek(RFile.java:607)
                  at org.apache.accumulo.core.iterators.system.LocalityGroupIterator.seek(LocalityGroupIterator.java:142)
                  at org.apache.accumulo.core.file.rfile.RFile$Reader.seek(RFile.java:979)
                  at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:101)
                  at org.apache.accumulo.tserver.MemKeyConversionIterator.seek(InMemoryMap.java:168)
                  at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator._switchNow(SourceSwitchingIterator.java:171)
                  at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.switchNow(SourceSwitchingIterator.java:179)
                  at org.apache.accumulo.tserver.InMemoryMap$MemoryIterator.switchNow(InMemoryMap.java:647)
                  at org.apache.accumulo.tserver.InMemoryMap$MemoryIterator.access$900(InMemoryMap.java:601)
                  at org.apache.accumulo.tserver.InMemoryMap.delete(InMemoryMap.java:746)
                  at org.apache.accumulo.tserver.Tablet$TabletMemory.finalizeMinC(Tablet.java:327)
                  at org.apache.accumulo.tserver.Tablet.minorCompact(Tablet.java:2068)
                  at org.apache.accumulo.tserver.Tablet.access$4300(Tablet.java:170)
                  at org.apache.accumulo.tserver.Tablet$MinorCompactionTask.run(Tablet.java:2134)
                  at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
                  at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                  at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
                  at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
                  at java.lang.Thread.run(Thread.java:744)

          We have an installation that is large enough with high ingest and query that is triggering it regularly. I'm currently working on acquiring the datanode logs on the other end, but the footprint and conditions for this are near identical to the original cases we were seeing the old stack traces occur in.

          Show
          vines John Vines added a comment - I think this has recurred in 1.6.0 in a different with the patch you included. 2014-11-14 22:08:12,745 [tserver.InMemoryMap] ERROR: Failed to create mem dump file java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:267) at org.apache.accumulo.core.file.rfile.RelativeKey.fastSkip(RelativeKey.java:314) at org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader._seek(RFile.java:748) at org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader.seek(RFile.java:607) at org.apache.accumulo.core.iterators.system.LocalityGroupIterator.seek(LocalityGroupIterator.java:142) at org.apache.accumulo.core.file.rfile.RFile$Reader.seek(RFile.java:979) at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:101) at org.apache.accumulo.tserver.MemKeyConversionIterator.seek(InMemoryMap.java:168) at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator._switchNow(SourceSwitchingIterator.java:171) at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.switchNow(SourceSwitchingIterator.java:179) at org.apache.accumulo.tserver.InMemoryMap$MemoryIterator.switchNow(InMemoryMap.java:647) at org.apache.accumulo.tserver.InMemoryMap$MemoryIterator.access$900(InMemoryMap.java:601) at org.apache.accumulo.tserver.InMemoryMap.delete(InMemoryMap.java:746) at org.apache.accumulo.tserver.Tablet$TabletMemory.finalizeMinC(Tablet.java:327) at org.apache.accumulo.tserver.Tablet.minorCompact(Tablet.java:2068) at org.apache.accumulo.tserver.Tablet.access$4300(Tablet.java:170) at org.apache.accumulo.tserver.Tablet$MinorCompactionTask.run(Tablet.java:2134) at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34) at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47) at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34) at java.lang. Thread .run( Thread .java:744) We have an installation that is large enough with high ingest and query that is triggering it regularly. I'm currently working on acquiring the datanode logs on the other end, but the footprint and conditions for this are near identical to the original cases we were seeing the old stack traces occur in.
          Hide
          vines John Vines added a comment - - edited

          Additonal info: no related errors as far as I can tell in the dfs datanode logs.

          The error message is not correct. It says it failed to create the mem dump file, but the exception occurs after it has been created. The exception occurs in the MemoryIterator.switchNow() call for one, where it goes through each MI and switches them. Under the hood, this causes them to seek in the mem dump file.

          Initial thought was when it fails it just prevents the IMM from clearing out. However, we then see exceptions afterward coming out of the full iterator stack. I have yet to determine if it's in the iterator that attempted to switch, iterators which already had switched, or the iterators which had not switched.

          Show
          vines John Vines added a comment - - edited Additonal info: no related errors as far as I can tell in the dfs datanode logs. The error message is not correct. It says it failed to create the mem dump file, but the exception occurs after it has been created. The exception occurs in the MemoryIterator.switchNow() call for one, where it goes through each MI and switches them. Under the hood, this causes them to seek in the mem dump file. Initial thought was when it fails it just prevents the IMM from clearing out. However, we then see exceptions afterward coming out of the full iterator stack. I have yet to determine if it's in the iterator that attempted to switch, iterators which already had switched, or the iterators which had not switched.
          Hide
          ecn Eric Newton added a comment -

          I'm having some difficulty parsing that comment, and I feel it might be very important. Can you edit, or rephrase? Thanks!

          Show
          ecn Eric Newton added a comment - I'm having some difficulty parsing that comment, and I feel it might be very important. Can you edit, or rephrase? Thanks!
          Hide
          vines John Vines added a comment -

          Edited, hopefully for clarity

          Show
          vines John Vines added a comment - Edited, hopefully for clarity
          Hide
          vines John Vines added a comment -

          After looking at the code more deeply, I think the only one left erroring is the one that failed to switch. One way to defend is to get it seeked before updating teh SSI's source and iter fields.

          Show
          vines John Vines added a comment - After looking at the code more deeply, I think the only one left erroring is the one that failed to switch. One way to defend is to get it seeked before updating teh SSI's source and iter fields.
          Hide
          elserj Josh Elser added a comment -

          John Vines since this issue did get released in 1.5.2 and 1.6.1, can we open up a new issue to track things and link it to this one? I don't want the fact that a change did make it into those released version get lost because it happened to be a partial fix.

          Show
          elserj Josh Elser added a comment - John Vines since this issue did get released in 1.5.2 and 1.6.1, can we open up a new issue to track things and link it to this one? I don't want the fact that a change did make it into those released version get lost because it happened to be a partial fix.
          Hide
          vines John Vines added a comment -

          Sure

          Show
          vines John Vines added a comment - Sure

            People

            • Assignee:
              kturner Keith Turner
              Reporter:
              kturner Keith Turner
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h
                1h

                  Development