Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-12958

PhantomReference for filesystem statistics can trigger OOM

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.3, 2.6.4
    • Fix Version/s: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I saw an OOM that appears to have been caused by the phantom references introduced for file system statistics management. I'll post details in a followup comment.

        Issue Links

          Activity

          Hide
          jlowe Jason Lowe added a comment -

          This OOM was hit by Tez during container reuse as it was transitioning between two tasks. The previous task was using a DefaultSorter object that had a SpillThread inner class as a utility thread. That thread was using the filesystem, so there was a phantom reference to that thread. The outer DefaultSorter class had a very large byte buffer (>50% of the heap size). When the previous task completed all references except for the phantom reference were released, and when the new task initialized it tried to instantiate a new DefaultSorter with a similarly-sized large buffer.

          The attempt to allocate another >50% heap object triggered a full GC since the old DefaultSorter object hadn't been collected yet. Since the phantom reference was still referring to the old SpillThread object (and therefore indirectly to the old DefaultSorter object), the garbage collector queued the phantom reference but could not reclaim it. All it could do is enqueue it on the specified queue for further processing by the application code. Therefore after the full GC completed we still had >50% of the heap used from the old DefaultSorter object waiting for final phantom reference cleanup processing, and that resulted in an OOM for the attempt to initialize a new DefaultSorter object of similar size.

          Normally the prior task's DefaultSorter would be collected in a single full GC cycle, since the Tez code makes sure it's no longer referenced. However the phantom reference in the filesystem statistics code is causing some objects that normally would be collected in a single full GC cycle to survive that cycle, and that breaks any use-case where the object is >50% of the heap and a similarly-sized object will be subsequently allocated. Tez container reuse does just that, shutting down one task just before it initializes another.

          Debugging the OOMs caused by this is a bit tricky, since most of the OOM-triggered heap dumps showed plenty of memory available on the heap. It appears that by the time the OOM dump is created, the phantom reference queue was handled by the StatisticsDataReferenceCleaner so the objects in question were all unreachable on the heap in the dump.

          Show
          jlowe Jason Lowe added a comment - This OOM was hit by Tez during container reuse as it was transitioning between two tasks. The previous task was using a DefaultSorter object that had a SpillThread inner class as a utility thread. That thread was using the filesystem, so there was a phantom reference to that thread. The outer DefaultSorter class had a very large byte buffer (>50% of the heap size). When the previous task completed all references except for the phantom reference were released, and when the new task initialized it tried to instantiate a new DefaultSorter with a similarly-sized large buffer. The attempt to allocate another >50% heap object triggered a full GC since the old DefaultSorter object hadn't been collected yet. Since the phantom reference was still referring to the old SpillThread object (and therefore indirectly to the old DefaultSorter object), the garbage collector queued the phantom reference but could not reclaim it . All it could do is enqueue it on the specified queue for further processing by the application code. Therefore after the full GC completed we still had >50% of the heap used from the old DefaultSorter object waiting for final phantom reference cleanup processing, and that resulted in an OOM for the attempt to initialize a new DefaultSorter object of similar size. Normally the prior task's DefaultSorter would be collected in a single full GC cycle, since the Tez code makes sure it's no longer referenced. However the phantom reference in the filesystem statistics code is causing some objects that normally would be collected in a single full GC cycle to survive that cycle, and that breaks any use-case where the object is >50% of the heap and a similarly-sized object will be subsequently allocated. Tez container reuse does just that, shutting down one task just before it initializes another. Debugging the OOMs caused by this is a bit tricky, since most of the OOM-triggered heap dumps showed plenty of memory available on the heap. It appears that by the time the OOM dump is created, the phantom reference queue was handled by the StatisticsDataReferenceCleaner so the objects in question were all unreachable on the heap in the dump.
          Hide
          sjlee0 Sangjin Lee added a comment -

          Thanks for reporting this Jason Lowe. This is rather surprising, and we certainly haven't encountered this (we've been running this code for a while now).

          The javadoc for how a phantom reference behaves says the following:

          Unlike soft and weak references, phantom references are not automatically cleared by the garbage collector as they are enqueued. An object that is reachable via phantom references will remain so until all such references are cleared or themselves become unreachable.

          In our case, the phantom reference itself becomes unreachable as a result of the clean-up. Thus, there should be only one enqueuing/dequeuing, and at the end of the clean-up the reference and the referent should be claimable.

          The javadoc says phantom references are "not automatically" cleared by the garbage collector, but when I tested smaller test scenarios on several JVMs, the full GC itself seems to clear out the phantom reference before the dequeuing runs (I'd be happy to post the test code if that would help).

          Could you please try the following? Would it be possible to reproduce this with more tracing? The flags are

          -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintReferenceGC
          

          The last one will report how the phantom references (and others) are handled by each GC. Also, the JVM/OS version with which you saw this would be helpful. Thanks!

          Show
          sjlee0 Sangjin Lee added a comment - Thanks for reporting this Jason Lowe . This is rather surprising, and we certainly haven't encountered this (we've been running this code for a while now). The javadoc for how a phantom reference behaves says the following: Unlike soft and weak references, phantom references are not automatically cleared by the garbage collector as they are enqueued. An object that is reachable via phantom references will remain so until all such references are cleared or themselves become unreachable. In our case, the phantom reference itself becomes unreachable as a result of the clean-up. Thus, there should be only one enqueuing/dequeuing, and at the end of the clean-up the reference and the referent should be claimable. The javadoc says phantom references are "not automatically" cleared by the garbage collector, but when I tested smaller test scenarios on several JVMs, the full GC itself seems to clear out the phantom reference before the dequeuing runs (I'd be happy to post the test code if that would help). Could you please try the following? Would it be possible to reproduce this with more tracing? The flags are -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintReferenceGC The last one will report how the phantom references (and others) are handled by each GC. Also, the JVM/OS version with which you saw this would be helpful. Thanks!
          Hide
          cmccabe Colin P. McCabe added a comment -

          Hi Jason Lowe,

          Sorry if this is a silly question, but do you think Tez could call System#gc several times when reusing containers? Would that help this situation?

          Show
          cmccabe Colin P. McCabe added a comment - Hi Jason Lowe , Sorry if this is a silly question, but do you think Tez could call System#gc several times when reusing containers? Would that help this situation?
          Hide
          jlowe Jason Lowe added a comment -

          Thanks for chiming in, Sangjin Lee. I ran the job again with the requested JVM options, and here's the last few lines in the GC log. The task ran with JDK 1.8.0_60-b27 on RHEL 6.5:

          2016-03-24T20:10:36.476+0000: 107.002: [GC (Allocation Failure) 2016-03-24T20:10:36.481+0000: 107.007: [SoftReference, 0 refs, 0.0000264 secs]2016-03-24T20:10:36.481+0000: 107.007: [WeakReference, 0 refs, 0.0000130 secs]2016-03-24T20:10:36.481+0000: 107.007: [FinalReference, 0 refs, 0.0000127 secs]2016-03-24T20:10:36.481+0000: 107.007: [PhantomReference, 0 refs, 0 refs, 0.0000140 secs]2016-03-24T20:10:36.481+0000: 107.007: [JNI Weak Reference, 0.0000170 secs][PSYoungGen: 115296K->96K(115712K)] 603566K->488366K(734720K), 0.0061637 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 
          2016-03-24T20:10:36.960+0000: 107.486: [GC (Allocation Failure) 2016-03-24T20:10:36.965+0000: 107.491: [SoftReference, 0 refs, 0.0000472 secs]2016-03-24T20:10:36.965+0000: 107.491: [WeakReference, 0 refs, 0.0000363 secs]2016-03-24T20:10:36.965+0000: 107.491: [FinalReference, 0 refs, 0.0000108 secs]2016-03-24T20:10:36.965+0000: 107.491: [PhantomReference, 0 refs, 0 refs, 0.0000083 secs]2016-03-24T20:10:36.965+0000: 107.491: [JNI Weak Reference, 0.0000091 secs][PSYoungGen: 115296K->96K(115712K)] 603566K->488366K(734720K), 0.0056482 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
          2016-03-24T20:10:38.447+0000: 108.973: [GC (Allocation Failure) 2016-03-24T20:10:38.452+0000: 108.978: [SoftReference, 0 refs, 0.0000338 secs]2016-03-24T20:10:38.452+0000: 108.978: [WeakReference, 0 refs, 0.0000123 secs]2016-03-24T20:10:38.452+0000: 108.978: [FinalReference, 3 refs, 0.0000158 secs]2016-03-24T20:10:38.452+0000: 108.978: [PhantomReference, 0 refs, 0 refs, 0.0000113 secs]2016-03-24T20:10:38.452+0000: 108.978: [JNI Weak Reference, 0.0000148 secs][PSYoungGen: 115296K->256K(115712K)] 603566K->488526K(734720K), 0.0058352 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 
          2016-03-24T20:10:40.917+0000: 111.443: [GC (Allocation Failure) 2016-03-24T20:10:40.922+0000: 111.448: [SoftReference, 0 refs, 0.0000209 secs]2016-03-24T20:10:40.922+0000: 111.448: [WeakReference, 159 refs, 0.0000212 secs]2016-03-24T20:10:40.922+0000: 111.448: [FinalReference, 269 refs, 0.0003309 secs]2016-03-24T20:10:40.923+0000: 111.449: [PhantomReference, 0 refs, 6 refs, 0.0000108 secs]2016-03-24T20:10:40.923+0000: 111.449: [JNI Weak Reference, 0.0000108 secs][PSYoungGen: 115456K->480K(115712K)] 603726K->489305K(734720K), 0.0061493 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
          2016-03-24T20:10:42.797+0000: 113.323: [GC (Allocation Failure) 2016-03-24T20:10:42.802+0000: 113.328: [SoftReference, 0 refs, 0.0000230 secs]2016-03-24T20:10:42.802+0000: 113.328: [WeakReference, 231 refs, 0.0000225 secs]2016-03-24T20:10:42.802+0000: 113.328: [FinalReference, 350 refs, 0.0004041 secs]2016-03-24T20:10:42.803+0000: 113.329: [PhantomReference, 0 refs, 6 refs, 0.0000097 secs]2016-03-24T20:10:42.803+0000: 113.329: [JNI Weak Reference, 0.0000106 secs][PSYoungGen: 115680K->480K(114688K)] 604505K->489537K(733696K), 0.0062425 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 
          2016-03-24T20:10:44.626+0000: 115.151: [GC (Allocation Failure) 2016-03-24T20:10:44.630+0000: 115.156: [SoftReference, 0 refs, 0.0000247 secs]2016-03-24T20:10:44.630+0000: 115.156: [WeakReference, 230 refs, 0.0000234 secs]2016-03-24T20:10:44.630+0000: 115.156: [FinalReference, 350 refs, 0.0003741 secs]2016-03-24T20:10:44.631+0000: 115.157: [PhantomReference, 0 refs, 6 refs, 0.0000095 secs]2016-03-24T20:10:44.631+0000: 115.157: [JNI Weak Reference, 0.0000104 secs][PSYoungGen: 114656K->896K(115200K)] 603713K->489961K(734208K), 0.0059075 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 
          2016-03-24T20:10:46.449+0000: 116.975: [GC (Allocation Failure) 2016-03-24T20:10:46.455+0000: 116.981: [SoftReference, 0 refs, 0.0000285 secs]2016-03-24T20:10:46.455+0000: 116.981: [WeakReference, 235 refs, 0.0000293 secs]2016-03-24T20:10:46.455+0000: 116.981: [FinalReference, 356 refs, 0.0004249 secs]2016-03-24T20:10:46.455+0000: 116.981: [PhantomReference, 0 refs, 7 refs, 0.0000106 secs]2016-03-24T20:10:46.455+0000: 116.981: [JNI Weak Reference, 0.0000105 secs][PSYoungGen: 115072K->864K(114176K)] 604137K->489929K(733184K), 0.0069392 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
          2016-03-24T20:10:48.741+0000: 119.267: [GC (Allocation Failure) 2016-03-24T20:10:48.749+0000: 119.275: [SoftReference, 0 refs, 0.0000592 secs]2016-03-24T20:10:48.749+0000: 119.275: [WeakReference, 639 refs, 0.0000830 secs]2016-03-24T20:10:48.749+0000: 119.275: [FinalReference, 177 refs, 0.0002006 secs]2016-03-24T20:10:48.749+0000: 119.275: [PhantomReference, 0 refs, 12 refs, 0.0000099 secs]2016-03-24T20:10:48.749+0000: 119.275: [JNI Weak Reference, 0.0000109 secs][PSYoungGen: 114016K->1514K(114688K)] 603081K->497627K(733696K), 0.0088279 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] 
          2016-03-24T20:10:48.837+0000: 119.362: [GC (Allocation Failure) 2016-03-24T20:10:48.844+0000: 119.369: [SoftReference, 0 refs, 0.0000405 secs]2016-03-24T20:10:48.844+0000: 119.370: [WeakReference, 281 refs, 0.0000290 secs]2016-03-24T20:10:48.844+0000: 119.370: [FinalReference, 3 refs, 0.0000159 secs]2016-03-24T20:10:48.844+0000: 119.370: [PhantomReference, 0 refs, 1 refs, 0.0000076 secs]2016-03-24T20:10:48.844+0000: 119.370: [JNI Weak Reference, 0.0000102 secs][PSYoungGen: 10455K->1530K(109568K)] 506569K->497707K(1041920K), 0.0077403 secs] [Times: user=0.04 sys=0.00, real=0.00 secs] 
          2016-03-24T20:10:48.844+0000: 119.370: [GC (Allocation Failure) 2016-03-24T20:10:48.849+0000: 119.375: [SoftReference, 0 refs, 0.0000377 secs]2016-03-24T20:10:48.849+0000: 119.375: [WeakReference, 279 refs, 0.0000288 secs]2016-03-24T20:10:48.849+0000: 119.375: [FinalReference, 0 refs, 0.0000209 secs]2016-03-24T20:10:48.849+0000: 119.375: [PhantomReference, 0 refs, 1 refs, 0.0000113 secs]2016-03-24T20:10:48.849+0000: 119.375: [JNI Weak Reference, 0.0000132 secs][PSYoungGen: 1530K->1562K(112128K)] 497707K->497739K(1044480K), 0.0056076 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
          2016-03-24T20:10:48.850+0000: 119.376: [Full GC (Allocation Failure) 2016-03-24T20:10:48.857+0000: 119.383: [SoftReference, 0 refs, 0.0000476 secs]2016-03-24T20:10:48.857+0000: 119.383: [WeakReference, 1345 refs, 0.0000814 secs]2016-03-24T20:10:48.857+0000: 119.383: [FinalReference, 1006 refs, 0.0002261 secs]2016-03-24T20:10:48.857+0000: 119.383: [PhantomReference, 2 refs, 20 refs, 0.0000691 secs]2016-03-24T20:10:48.858+0000: 119.383: [JNI Weak Reference, 0.0000115 secs][PSYoungGen: 1562K->0K(112128K)] [ParOldGen: 496177K->490120K(627200K)] 497739K->490120K(739328K), [Metaspace: 29185K->29185K(1075200K)], 0.1824223 secs] [Times: user=0.64 sys=0.00, real=0.18 secs] 
          2016-03-24T20:10:49.033+0000: 119.559: [GC (Allocation Failure) 2016-03-24T20:10:49.038+0000: 119.564: [SoftReference, 0 refs, 0.0000332 secs]2016-03-24T20:10:49.038+0000: 119.564: [WeakReference, 0 refs, 0.0000123 secs]2016-03-24T20:10:49.038+0000: 119.564: [FinalReference, 0 refs, 0.0000103 secs]2016-03-24T20:10:49.038+0000: 119.564: [PhantomReference, 0 refs, 0 refs, 0.0000110 secs]2016-03-24T20:10:49.038+0000: 119.564: [JNI Weak Reference, 0.0000152 secs][PSYoungGen: 0K->0K(112128K)] 490120K->490120K(1044480K), 0.0057650 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
          2016-03-24T20:10:49.039+0000: 119.565: [Full GC (Allocation Failure) 2016-03-24T20:10:49.046+0000: 119.572: [SoftReference, 591 refs, 0.0001611 secs]2016-03-24T20:10:49.046+0000: 119.572: [WeakReference, 1218 refs, 0.0000570 secs]2016-03-24T20:10:49.046+0000: 119.572: [FinalReference, 691 refs, 0.0000371 secs]2016-03-24T20:10:49.046+0000: 119.572: [PhantomReference, 0 refs, 15 refs, 0.0000083 secs]2016-03-24T20:10:49.046+0000: 119.572: [JNI Weak Reference, 0.0000102 secs][PSYoungGen: 0K->0K(112128K)] [ParOldGen: 490120K->489358K(651776K)] 490120K->489358K(763904K), [Metaspace: 29185K->29022K(1075200K)], 0.1861130 secs] [Times: user=0.64 sys=0.01, real=0.18 secs] 
          Heap
           PSYoungGen      total 112128K, used 8888K [0x00000000f8e80000, 0x0000000100000000, 0x0000000100000000)
            eden space 108032K, 8% used [0x00000000f8e80000,0x00000000f972e390,0x00000000ff800000)
            from space 4096K, 0% used [0x00000000ffc00000,0x00000000ffc00000,0x0000000100000000)
            to   space 4096K, 0% used [0x00000000ff800000,0x00000000ff800000,0x00000000ffc00000)
           ParOldGen       total 932352K, used 489358K [0x00000000c0000000, 0x00000000f8e80000, 0x00000000f8e80000)
            object space 932352K, 52% used [0x00000000c0000000,0x00000000ddde3a80,0x00000000f8e80000)
           Metaspace       used 29118K, capacity 29346K, committed 29872K, reserved 1075200K
            class space    used 3320K, capacity 3374K, committed 3504K, reserved 1048576K
          

          Colin P. McCabe's suggestion would probably fix the issue, since it hopefully triggers a GC cycle before the big buffer comes along to force one. Then we won't be in the middle of trying to satisfy a huge allocation when we notice the phantom references are available for collection for the first time and still need to do app-level processing to free it. My main concern with relying on that approach is tracking down other frameworks or apps that will need to do similar things, since these kinds of jobs ran fine before.

          Show
          jlowe Jason Lowe added a comment - Thanks for chiming in, Sangjin Lee . I ran the job again with the requested JVM options, and here's the last few lines in the GC log. The task ran with JDK 1.8.0_60-b27 on RHEL 6.5: 2016-03-24T20:10:36.476+0000: 107.002: [GC (Allocation Failure) 2016-03-24T20:10:36.481+0000: 107.007: [SoftReference, 0 refs, 0.0000264 secs]2016-03-24T20:10:36.481+0000: 107.007: [WeakReference, 0 refs, 0.0000130 secs]2016-03-24T20:10:36.481+0000: 107.007: [FinalReference, 0 refs, 0.0000127 secs]2016-03-24T20:10:36.481+0000: 107.007: [PhantomReference, 0 refs, 0 refs, 0.0000140 secs]2016-03-24T20:10:36.481+0000: 107.007: [JNI Weak Reference, 0.0000170 secs][PSYoungGen: 115296K->96K(115712K)] 603566K->488366K(734720K), 0.0061637 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 2016-03-24T20:10:36.960+0000: 107.486: [GC (Allocation Failure) 2016-03-24T20:10:36.965+0000: 107.491: [SoftReference, 0 refs, 0.0000472 secs]2016-03-24T20:10:36.965+0000: 107.491: [WeakReference, 0 refs, 0.0000363 secs]2016-03-24T20:10:36.965+0000: 107.491: [FinalReference, 0 refs, 0.0000108 secs]2016-03-24T20:10:36.965+0000: 107.491: [PhantomReference, 0 refs, 0 refs, 0.0000083 secs]2016-03-24T20:10:36.965+0000: 107.491: [JNI Weak Reference, 0.0000091 secs][PSYoungGen: 115296K->96K(115712K)] 603566K->488366K(734720K), 0.0056482 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 2016-03-24T20:10:38.447+0000: 108.973: [GC (Allocation Failure) 2016-03-24T20:10:38.452+0000: 108.978: [SoftReference, 0 refs, 0.0000338 secs]2016-03-24T20:10:38.452+0000: 108.978: [WeakReference, 0 refs, 0.0000123 secs]2016-03-24T20:10:38.452+0000: 108.978: [FinalReference, 3 refs, 0.0000158 secs]2016-03-24T20:10:38.452+0000: 108.978: [PhantomReference, 0 refs, 0 refs, 0.0000113 secs]2016-03-24T20:10:38.452+0000: 108.978: [JNI Weak Reference, 0.0000148 secs][PSYoungGen: 115296K->256K(115712K)] 603566K->488526K(734720K), 0.0058352 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 2016-03-24T20:10:40.917+0000: 111.443: [GC (Allocation Failure) 2016-03-24T20:10:40.922+0000: 111.448: [SoftReference, 0 refs, 0.0000209 secs]2016-03-24T20:10:40.922+0000: 111.448: [WeakReference, 159 refs, 0.0000212 secs]2016-03-24T20:10:40.922+0000: 111.448: [FinalReference, 269 refs, 0.0003309 secs]2016-03-24T20:10:40.923+0000: 111.449: [PhantomReference, 0 refs, 6 refs, 0.0000108 secs]2016-03-24T20:10:40.923+0000: 111.449: [JNI Weak Reference, 0.0000108 secs][PSYoungGen: 115456K->480K(115712K)] 603726K->489305K(734720K), 0.0061493 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 2016-03-24T20:10:42.797+0000: 113.323: [GC (Allocation Failure) 2016-03-24T20:10:42.802+0000: 113.328: [SoftReference, 0 refs, 0.0000230 secs]2016-03-24T20:10:42.802+0000: 113.328: [WeakReference, 231 refs, 0.0000225 secs]2016-03-24T20:10:42.802+0000: 113.328: [FinalReference, 350 refs, 0.0004041 secs]2016-03-24T20:10:42.803+0000: 113.329: [PhantomReference, 0 refs, 6 refs, 0.0000097 secs]2016-03-24T20:10:42.803+0000: 113.329: [JNI Weak Reference, 0.0000106 secs][PSYoungGen: 115680K->480K(114688K)] 604505K->489537K(733696K), 0.0062425 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 2016-03-24T20:10:44.626+0000: 115.151: [GC (Allocation Failure) 2016-03-24T20:10:44.630+0000: 115.156: [SoftReference, 0 refs, 0.0000247 secs]2016-03-24T20:10:44.630+0000: 115.156: [WeakReference, 230 refs, 0.0000234 secs]2016-03-24T20:10:44.630+0000: 115.156: [FinalReference, 350 refs, 0.0003741 secs]2016-03-24T20:10:44.631+0000: 115.157: [PhantomReference, 0 refs, 6 refs, 0.0000095 secs]2016-03-24T20:10:44.631+0000: 115.157: [JNI Weak Reference, 0.0000104 secs][PSYoungGen: 114656K->896K(115200K)] 603713K->489961K(734208K), 0.0059075 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 2016-03-24T20:10:46.449+0000: 116.975: [GC (Allocation Failure) 2016-03-24T20:10:46.455+0000: 116.981: [SoftReference, 0 refs, 0.0000285 secs]2016-03-24T20:10:46.455+0000: 116.981: [WeakReference, 235 refs, 0.0000293 secs]2016-03-24T20:10:46.455+0000: 116.981: [FinalReference, 356 refs, 0.0004249 secs]2016-03-24T20:10:46.455+0000: 116.981: [PhantomReference, 0 refs, 7 refs, 0.0000106 secs]2016-03-24T20:10:46.455+0000: 116.981: [JNI Weak Reference, 0.0000105 secs][PSYoungGen: 115072K->864K(114176K)] 604137K->489929K(733184K), 0.0069392 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 2016-03-24T20:10:48.741+0000: 119.267: [GC (Allocation Failure) 2016-03-24T20:10:48.749+0000: 119.275: [SoftReference, 0 refs, 0.0000592 secs]2016-03-24T20:10:48.749+0000: 119.275: [WeakReference, 639 refs, 0.0000830 secs]2016-03-24T20:10:48.749+0000: 119.275: [FinalReference, 177 refs, 0.0002006 secs]2016-03-24T20:10:48.749+0000: 119.275: [PhantomReference, 0 refs, 12 refs, 0.0000099 secs]2016-03-24T20:10:48.749+0000: 119.275: [JNI Weak Reference, 0.0000109 secs][PSYoungGen: 114016K->1514K(114688K)] 603081K->497627K(733696K), 0.0088279 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] 2016-03-24T20:10:48.837+0000: 119.362: [GC (Allocation Failure) 2016-03-24T20:10:48.844+0000: 119.369: [SoftReference, 0 refs, 0.0000405 secs]2016-03-24T20:10:48.844+0000: 119.370: [WeakReference, 281 refs, 0.0000290 secs]2016-03-24T20:10:48.844+0000: 119.370: [FinalReference, 3 refs, 0.0000159 secs]2016-03-24T20:10:48.844+0000: 119.370: [PhantomReference, 0 refs, 1 refs, 0.0000076 secs]2016-03-24T20:10:48.844+0000: 119.370: [JNI Weak Reference, 0.0000102 secs][PSYoungGen: 10455K->1530K(109568K)] 506569K->497707K(1041920K), 0.0077403 secs] [Times: user=0.04 sys=0.00, real=0.00 secs] 2016-03-24T20:10:48.844+0000: 119.370: [GC (Allocation Failure) 2016-03-24T20:10:48.849+0000: 119.375: [SoftReference, 0 refs, 0.0000377 secs]2016-03-24T20:10:48.849+0000: 119.375: [WeakReference, 279 refs, 0.0000288 secs]2016-03-24T20:10:48.849+0000: 119.375: [FinalReference, 0 refs, 0.0000209 secs]2016-03-24T20:10:48.849+0000: 119.375: [PhantomReference, 0 refs, 1 refs, 0.0000113 secs]2016-03-24T20:10:48.849+0000: 119.375: [JNI Weak Reference, 0.0000132 secs][PSYoungGen: 1530K->1562K(112128K)] 497707K->497739K(1044480K), 0.0056076 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 2016-03-24T20:10:48.850+0000: 119.376: [Full GC (Allocation Failure) 2016-03-24T20:10:48.857+0000: 119.383: [SoftReference, 0 refs, 0.0000476 secs]2016-03-24T20:10:48.857+0000: 119.383: [WeakReference, 1345 refs, 0.0000814 secs]2016-03-24T20:10:48.857+0000: 119.383: [FinalReference, 1006 refs, 0.0002261 secs]2016-03-24T20:10:48.857+0000: 119.383: [PhantomReference, 2 refs, 20 refs, 0.0000691 secs]2016-03-24T20:10:48.858+0000: 119.383: [JNI Weak Reference, 0.0000115 secs][PSYoungGen: 1562K->0K(112128K)] [ParOldGen: 496177K->490120K(627200K)] 497739K->490120K(739328K), [Metaspace: 29185K->29185K(1075200K)], 0.1824223 secs] [Times: user=0.64 sys=0.00, real=0.18 secs] 2016-03-24T20:10:49.033+0000: 119.559: [GC (Allocation Failure) 2016-03-24T20:10:49.038+0000: 119.564: [SoftReference, 0 refs, 0.0000332 secs]2016-03-24T20:10:49.038+0000: 119.564: [WeakReference, 0 refs, 0.0000123 secs]2016-03-24T20:10:49.038+0000: 119.564: [FinalReference, 0 refs, 0.0000103 secs]2016-03-24T20:10:49.038+0000: 119.564: [PhantomReference, 0 refs, 0 refs, 0.0000110 secs]2016-03-24T20:10:49.038+0000: 119.564: [JNI Weak Reference, 0.0000152 secs][PSYoungGen: 0K->0K(112128K)] 490120K->490120K(1044480K), 0.0057650 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 2016-03-24T20:10:49.039+0000: 119.565: [Full GC (Allocation Failure) 2016-03-24T20:10:49.046+0000: 119.572: [SoftReference, 591 refs, 0.0001611 secs]2016-03-24T20:10:49.046+0000: 119.572: [WeakReference, 1218 refs, 0.0000570 secs]2016-03-24T20:10:49.046+0000: 119.572: [FinalReference, 691 refs, 0.0000371 secs]2016-03-24T20:10:49.046+0000: 119.572: [PhantomReference, 0 refs, 15 refs, 0.0000083 secs]2016-03-24T20:10:49.046+0000: 119.572: [JNI Weak Reference, 0.0000102 secs][PSYoungGen: 0K->0K(112128K)] [ParOldGen: 490120K->489358K(651776K)] 490120K->489358K(763904K), [Metaspace: 29185K->29022K(1075200K)], 0.1861130 secs] [Times: user=0.64 sys=0.01, real=0.18 secs] Heap PSYoungGen total 112128K, used 8888K [0x00000000f8e80000, 0x0000000100000000, 0x0000000100000000) eden space 108032K, 8% used [0x00000000f8e80000,0x00000000f972e390,0x00000000ff800000) from space 4096K, 0% used [0x00000000ffc00000,0x00000000ffc00000,0x0000000100000000) to space 4096K, 0% used [0x00000000ff800000,0x00000000ff800000,0x00000000ffc00000) ParOldGen total 932352K, used 489358K [0x00000000c0000000, 0x00000000f8e80000, 0x00000000f8e80000) object space 932352K, 52% used [0x00000000c0000000,0x00000000ddde3a80,0x00000000f8e80000) Metaspace used 29118K, capacity 29346K, committed 29872K, reserved 1075200K class space used 3320K, capacity 3374K, committed 3504K, reserved 1048576K Colin P. McCabe 's suggestion would probably fix the issue, since it hopefully triggers a GC cycle before the big buffer comes along to force one. Then we won't be in the middle of trying to satisfy a huge allocation when we notice the phantom references are available for collection for the first time and still need to do app-level processing to free it. My main concern with relying on that approach is tracking down other frameworks or apps that will need to do similar things, since these kinds of jobs ran fine before.
          Hide
          sjlee0 Sangjin Lee added a comment -

          OK, I am now able to reproduce this with a simple test and a possible solution. Let me update it soon.

          Show
          sjlee0 Sangjin Lee added a comment - OK, I am now able to reproduce this with a simple test and a possible solution. Let me update it soon.
          Hide
          sjlee0 Sangjin Lee added a comment -

          Posted patch v.1.

          After looking at this some more, it seems that WeakReference is a better choice than PhantomReference after all. We are able to do clean-up based on the reference queue with either type, but the hold on the referent seems tighter (ironically) with the phantom reference in this case. Weak references will be cleared out by the garbage collector, unlike the phantom references. We never access the actual referent in the clean-up logic, so it is basically a matter of substituting the reference type.

          I distilled this use case and the FileSystem code to reproduce this case, and the following is the verbose output of that test with the phantom reference:

          $ java -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintReferenceGC -Xms1600m -Xmx1600m Test
          allocating 1 G
          creating a phantom reference to this thread
          sleeping for 10 seconds
          done sleeping
          allocating another 1 G
          10.528: [GC (Allocation Failure) 10.550: [SoftReference, 0 refs, 0.0000145 secs]10.550: [WeakReference, 9 refs, 0.0000054 secs]10.550: [FinalReference, 16 refs, 0.0000157 secs]10.550: [PhantomReference, 1 refs, 0 refs, 0.0000060 secs]10.550: [JNI Weak Reference, 0.0000061 secs][PSYoungGen: 24576K->416K(477696K)] 1073152K->1049000K(1570304K), 0.0222061 secs] [Times: user=0.17 sys=0.00, real=0.02 secs] 
          10.550: [Full GC (Ergonomics) 10.550: [SoftReference, 0 refs, 0.0000250 secs]10.550: [WeakReference, 2 refs, 0.0000048 secs]10.550: [FinalReference, 4 refs, 0.0000044 secs]10.550: [PhantomReference, 0 refs, 0 refs, 0.0000049 secs]10.550: [JNI Weak Reference, 0.0000031 secs][PSYoungGen: 416K->0K(477696K)] [ParOldGen: 1048584K->1048857K(1092608K)] 1049000K->1048857K(1570304K), [Metaspace: 2614K->2614K(1056768K)], 0.0216855 secs] [Times: user=0.02 sys=0.01, real=0.02 secs] 
          10.572: [GC (Allocation Failure) 10.584: [SoftReference, 0 refs, 0.0000340 secs]10.584: [WeakReference, 0 refs, 0.0000049 secs]10.584: [FinalReference, 0 refs, 0.0000043 secs]10.584: [PhantomReference, 0 refs, 0 refs, 0.0000055 secs]10.584: [JNI Weak Reference, 0.0000037 secs][PSYoungGen: 0K->0K(477696K)] 1048857K->1048857K(1570304K), 0.0125265 secs] [Times: user=0.07 sys=0.00, real=0.02 secs] 
          10.584: [Full GC (Allocation Failure) 10.585: [SoftReference, 37 refs, 0.0000157 secs]10.585: [WeakReference, 2 refs, 0.0000050 secs]10.585: [FinalReference, 4 refs, 0.0000047 secs]10.585: [PhantomReference, 0 refs, 0 refs, 0.0000053 secs]10.585: [JNI Weak Reference, 0.0000032 secs][PSYoungGen: 0K->0K(477696K)] [ParOldGen: 1048857K->1048845K(1092608K)] 1048857K->1048845K(1570304K), [Metaspace: 2614K->2614K(1056768K)], 0.0114447 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] 
          phantom reference dequeued
          Exception in thread "main" 
          java.lang.OutOfMemoryError: Java heap space
          	at Test$BigBufferContainer.<init>(Test.java:26)
          	at Test.main(Test.java:18)
          Heap
           PSYoungGen      total 477696K, used 20480K [0x000000079eb00000, 0x00000007c0000000, 0x00000007c0000000)
            eden space 409600K, 5% used [0x000000079eb00000,0x000000079ff001a8,0x00000007b7b00000)
            from space 68096K, 0% used [0x00000007bbd80000,0x00000007bbd80000,0x00000007c0000000)
            to   space 68096K, 0% used [0x00000007b7b00000,0x00000007b7b00000,0x00000007bbd80000)
           ParOldGen       total 1092608K, used 1048845K [0x000000075c000000, 0x000000079eb00000, 0x000000079eb00000)
            object space 1092608K, 95% used [0x000000075c000000,0x000000079c043718,0x000000079eb00000)
           Metaspace       used 2646K, capacity 4496K, committed 4864K, reserved 1056768K
            class space    used 291K, capacity 388K, committed 512K, reserved 1048576K
          

          With the weak reference:

          $ java -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintReferenceGC -Xms1600m -Xmx1600m Test
          allocating 1 G
          creating a weak reference to this thread
          sleeping for 10 seconds
          done sleeping
          allocating another 1 G
          10.673: [GC (Allocation Failure) 10.695: [SoftReference, 0 refs, 0.0000244 secs]10.695: [WeakReference, 10 refs, 0.0000064 secs]10.695: [FinalReference, 16 refs, 0.0000300 secs]10.695: [PhantomReference, 0 refs, 0 refs, 0.0000051 secs]10.695: [JNI Weak Reference, 0.0000073 secs][PSYoungGen: 24576K->416K(477696K)] 1073152K->1049000K(1570304K), 0.0229661 secs] [Times: user=0.17 sys=0.00, real=0.02 secs] 
          10.696: [Full GC (Ergonomics) 10.696: [SoftReference, 0 refs, 0.0000143 secs]10.696: [WeakReference, 2 refs, 0.0000047 secs]10.696: [FinalReference, 0 refs, 0.0000042 secs]10.696: [PhantomReference, 0 refs, 0 refs, 0.0000049 secs]10.696: [JNI Weak Reference, 0.0000032 secs][PSYoungGen: 416K->0K(477696K)] [ParOldGen: 1048584K->280K(1092608K)] 1049000K->280K(1570304K), [Metaspace: 2614K->2614K(1056768K)], 0.0209466 secs] [Times: user=0.02 sys=0.01, real=0.02 secs] 
          weak reference dequeued
          completed normally
          Heap
           PSYoungGen      total 477696K, used 20480K [0x000000079eb00000, 0x00000007c0000000, 0x00000007c0000000)
            eden space 409600K, 5% used [0x000000079eb00000,0x000000079ff001a8,0x00000007b7b00000)
            from space 68096K, 0% used [0x00000007b7b00000,0x00000007b7b00000,0x00000007bbd80000)
            to   space 68096K, 0% used [0x00000007bbd80000,0x00000007bbd80000,0x00000007c0000000)
           ParOldGen       total 1092608K, used 1048856K [0x000000075c000000, 0x000000079eb00000, 0x000000079eb00000)
            object space 1092608K, 95% used [0x000000075c000000,0x000000079c046378,0x000000079eb00000)
           Metaspace       used 2621K, capacity 4496K, committed 4864K, reserved 1056768K
            class space    used 288K, capacity 388K, committed 512K, reserved 1048576K
          
          Show
          sjlee0 Sangjin Lee added a comment - Posted patch v.1. After looking at this some more, it seems that WeakReference is a better choice than PhantomReference after all. We are able to do clean-up based on the reference queue with either type, but the hold on the referent seems tighter (ironically) with the phantom reference in this case. Weak references will be cleared out by the garbage collector, unlike the phantom references. We never access the actual referent in the clean-up logic, so it is basically a matter of substituting the reference type. I distilled this use case and the FileSystem code to reproduce this case, and the following is the verbose output of that test with the phantom reference: $ java -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintReferenceGC -Xms1600m -Xmx1600m Test allocating 1 G creating a phantom reference to this thread sleeping for 10 seconds done sleeping allocating another 1 G 10.528: [GC (Allocation Failure) 10.550: [SoftReference, 0 refs, 0.0000145 secs]10.550: [WeakReference, 9 refs, 0.0000054 secs]10.550: [FinalReference, 16 refs, 0.0000157 secs]10.550: [PhantomReference, 1 refs, 0 refs, 0.0000060 secs]10.550: [JNI Weak Reference, 0.0000061 secs][PSYoungGen: 24576K->416K(477696K)] 1073152K->1049000K(1570304K), 0.0222061 secs] [Times: user=0.17 sys=0.00, real=0.02 secs] 10.550: [Full GC (Ergonomics) 10.550: [SoftReference, 0 refs, 0.0000250 secs]10.550: [WeakReference, 2 refs, 0.0000048 secs]10.550: [FinalReference, 4 refs, 0.0000044 secs]10.550: [PhantomReference, 0 refs, 0 refs, 0.0000049 secs]10.550: [JNI Weak Reference, 0.0000031 secs][PSYoungGen: 416K->0K(477696K)] [ParOldGen: 1048584K->1048857K(1092608K)] 1049000K->1048857K(1570304K), [Metaspace: 2614K->2614K(1056768K)], 0.0216855 secs] [Times: user=0.02 sys=0.01, real=0.02 secs] 10.572: [GC (Allocation Failure) 10.584: [SoftReference, 0 refs, 0.0000340 secs]10.584: [WeakReference, 0 refs, 0.0000049 secs]10.584: [FinalReference, 0 refs, 0.0000043 secs]10.584: [PhantomReference, 0 refs, 0 refs, 0.0000055 secs]10.584: [JNI Weak Reference, 0.0000037 secs][PSYoungGen: 0K->0K(477696K)] 1048857K->1048857K(1570304K), 0.0125265 secs] [Times: user=0.07 sys=0.00, real=0.02 secs] 10.584: [Full GC (Allocation Failure) 10.585: [SoftReference, 37 refs, 0.0000157 secs]10.585: [WeakReference, 2 refs, 0.0000050 secs]10.585: [FinalReference, 4 refs, 0.0000047 secs]10.585: [PhantomReference, 0 refs, 0 refs, 0.0000053 secs]10.585: [JNI Weak Reference, 0.0000032 secs][PSYoungGen: 0K->0K(477696K)] [ParOldGen: 1048857K->1048845K(1092608K)] 1048857K->1048845K(1570304K), [Metaspace: 2614K->2614K(1056768K)], 0.0114447 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] phantom reference dequeued Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at Test$BigBufferContainer.<init>(Test.java:26) at Test.main(Test.java:18) Heap PSYoungGen total 477696K, used 20480K [0x000000079eb00000, 0x00000007c0000000, 0x00000007c0000000) eden space 409600K, 5% used [0x000000079eb00000,0x000000079ff001a8,0x00000007b7b00000) from space 68096K, 0% used [0x00000007bbd80000,0x00000007bbd80000,0x00000007c0000000) to space 68096K, 0% used [0x00000007b7b00000,0x00000007b7b00000,0x00000007bbd80000) ParOldGen total 1092608K, used 1048845K [0x000000075c000000, 0x000000079eb00000, 0x000000079eb00000) object space 1092608K, 95% used [0x000000075c000000,0x000000079c043718,0x000000079eb00000) Metaspace used 2646K, capacity 4496K, committed 4864K, reserved 1056768K class space used 291K, capacity 388K, committed 512K, reserved 1048576K With the weak reference: $ java -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintReferenceGC -Xms1600m -Xmx1600m Test allocating 1 G creating a weak reference to this thread sleeping for 10 seconds done sleeping allocating another 1 G 10.673: [GC (Allocation Failure) 10.695: [SoftReference, 0 refs, 0.0000244 secs]10.695: [WeakReference, 10 refs, 0.0000064 secs]10.695: [FinalReference, 16 refs, 0.0000300 secs]10.695: [PhantomReference, 0 refs, 0 refs, 0.0000051 secs]10.695: [JNI Weak Reference, 0.0000073 secs][PSYoungGen: 24576K->416K(477696K)] 1073152K->1049000K(1570304K), 0.0229661 secs] [Times: user=0.17 sys=0.00, real=0.02 secs] 10.696: [Full GC (Ergonomics) 10.696: [SoftReference, 0 refs, 0.0000143 secs]10.696: [WeakReference, 2 refs, 0.0000047 secs]10.696: [FinalReference, 0 refs, 0.0000042 secs]10.696: [PhantomReference, 0 refs, 0 refs, 0.0000049 secs]10.696: [JNI Weak Reference, 0.0000032 secs][PSYoungGen: 416K->0K(477696K)] [ParOldGen: 1048584K->280K(1092608K)] 1049000K->280K(1570304K), [Metaspace: 2614K->2614K(1056768K)], 0.0209466 secs] [Times: user=0.02 sys=0.01, real=0.02 secs] weak reference dequeued completed normally Heap PSYoungGen total 477696K, used 20480K [0x000000079eb00000, 0x00000007c0000000, 0x00000007c0000000) eden space 409600K, 5% used [0x000000079eb00000,0x000000079ff001a8,0x00000007b7b00000) from space 68096K, 0% used [0x00000007b7b00000,0x00000007b7b00000,0x00000007bbd80000) to space 68096K, 0% used [0x00000007bbd80000,0x00000007bbd80000,0x00000007c0000000) ParOldGen total 1092608K, used 1048856K [0x000000075c000000, 0x000000079eb00000, 0x000000079eb00000) object space 1092608K, 95% used [0x000000075c000000,0x000000079c046378,0x000000079eb00000) Metaspace used 2621K, capacity 4496K, committed 4864K, reserved 1056768K class space used 288K, capacity 388K, committed 512K, reserved 1048576K
          Hide
          sjlee0 Sangjin Lee added a comment -

          Jason Lowe, it would be great if you could apply the patch with the problem app and see if it fixes the problem. Thanks!

          Show
          sjlee0 Sangjin Lee added a comment - Jason Lowe , it would be great if you could apply the patch with the problem app and see if it fixes the problem. Thanks!
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 14s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 6m 33s trunk passed
          +1 compile 6m 24s trunk passed with JDK v1.8.0_74
          +1 compile 7m 21s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 21s trunk passed
          +1 mvnsite 0m 58s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 1m 35s trunk passed
          +1 javadoc 0m 54s trunk passed with JDK v1.8.0_74
          +1 javadoc 1m 6s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 41s the patch passed
          +1 compile 6m 21s the patch passed with JDK v1.8.0_74
          +1 javac 6m 21s the patch passed
          +1 compile 7m 16s the patch passed with JDK v1.7.0_95
          +1 javac 7m 16s the patch passed
          +1 checkstyle 0m 22s the patch passed
          +1 mvnsite 0m 57s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 1m 50s the patch passed
          +1 javadoc 0m 58s the patch passed with JDK v1.8.0_74
          +1 javadoc 1m 5s the patch passed with JDK v1.7.0_95
          -1 unit 6m 57s hadoop-common in the patch failed with JDK v1.8.0_74.
          -1 unit 7m 30s hadoop-common in the patch failed with JDK v1.7.0_95.
          -1 asflicense 0m 22s Patch generated 2 ASF License warnings.
          61m 20s



          Reason Tests
          JDK v1.8.0_74 Timed out junit tests org.apache.hadoop.util.TestNativeLibraryChecker
          JDK v1.7.0_95 Timed out junit tests org.apache.hadoop.util.TestNativeLibraryChecker



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:fbe3e86
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12795324/HADOOP-12958.01.patch
          JIRA Issue HADOOP-12958
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 6385ac1576f0 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 2e1d0ff
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_74 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8921/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_74.txt
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8921/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_95.txt
          unit test logs https://builds.apache.org/job/PreCommit-HADOOP-Build/8921/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_74.txt https://builds.apache.org/job/PreCommit-HADOOP-Build/8921/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_95.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/8921/testReport/
          asflicense https://builds.apache.org/job/PreCommit-HADOOP-Build/8921/artifact/patchprocess/patch-asflicense-problems.txt
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8921/console
          Powered by Apache Yetus 0.2.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 14s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 6m 33s trunk passed +1 compile 6m 24s trunk passed with JDK v1.8.0_74 +1 compile 7m 21s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 21s trunk passed +1 mvnsite 0m 58s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 35s trunk passed +1 javadoc 0m 54s trunk passed with JDK v1.8.0_74 +1 javadoc 1m 6s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 41s the patch passed +1 compile 6m 21s the patch passed with JDK v1.8.0_74 +1 javac 6m 21s the patch passed +1 compile 7m 16s the patch passed with JDK v1.7.0_95 +1 javac 7m 16s the patch passed +1 checkstyle 0m 22s the patch passed +1 mvnsite 0m 57s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 50s the patch passed +1 javadoc 0m 58s the patch passed with JDK v1.8.0_74 +1 javadoc 1m 5s the patch passed with JDK v1.7.0_95 -1 unit 6m 57s hadoop-common in the patch failed with JDK v1.8.0_74. -1 unit 7m 30s hadoop-common in the patch failed with JDK v1.7.0_95. -1 asflicense 0m 22s Patch generated 2 ASF License warnings. 61m 20s Reason Tests JDK v1.8.0_74 Timed out junit tests org.apache.hadoop.util.TestNativeLibraryChecker JDK v1.7.0_95 Timed out junit tests org.apache.hadoop.util.TestNativeLibraryChecker Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12795324/HADOOP-12958.01.patch JIRA Issue HADOOP-12958 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 6385ac1576f0 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 2e1d0ff Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_74 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8921/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_74.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8921/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-HADOOP-Build/8921/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_74.txt https://builds.apache.org/job/PreCommit-HADOOP-Build/8921/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/8921/testReport/ asflicense https://builds.apache.org/job/PreCommit-HADOOP-Build/8921/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8921/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          Thanks, Sangjin! I verified this patch allows the job to run and it fails without it.

          I'm +1 for the patch and would like to commit it later today if there are no objections.

          Show
          jlowe Jason Lowe added a comment - Thanks, Sangjin! I verified this patch allows the job to run and it fails without it. I'm +1 for the patch and would like to commit it later today if there are no objections.
          Hide
          sjlee0 Sangjin Lee added a comment -

          Thanks Jason!

          One interesting side note: initially I was not able to reproduce it because I was creating a Runnable (almost by instinct) and hanging the huge object there and then created a thread with that Runnable. It turns out when a thread exits it nulls out the Runnable, so that gets claimed no matter what. When I switched to a thread without the Runnable, I was able to reproduce it.

          Yet another reason a Runnable is better than a raw thread creation.

          Show
          sjlee0 Sangjin Lee added a comment - Thanks Jason! One interesting side note: initially I was not able to reproduce it because I was creating a Runnable (almost by instinct) and hanging the huge object there and then created a thread with that Runnable. It turns out when a thread exits it nulls out the Runnable, so that gets claimed no matter what. When I switched to a thread without the Runnable, I was able to reproduce it. Yet another reason a Runnable is better than a raw thread creation.
          Hide
          jlowe Jason Lowe added a comment -

          Thanks Sangjin Lee! I committed this to trunk, branch-2, branch-2.8, branch-2.7, and branch-2.6.

          Show
          jlowe Jason Lowe added a comment - Thanks Sangjin Lee ! I committed this to trunk, branch-2, branch-2.8, branch-2.7, and branch-2.6.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #9501 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9501/)
          HADOOP-12958. PhantomReference for filesystem statistics can trigger (jlowe: rev 9a09200a1f5f752e266d4fb8e0c808073080bde8)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #9501 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9501/ ) HADOOP-12958 . PhantomReference for filesystem statistics can trigger (jlowe: rev 9a09200a1f5f752e266d4fb8e0c808073080bde8) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Closing the JIRA as part of 2.7.3 release.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Closing the JIRA as part of 2.7.3 release.

            People

            • Assignee:
              sjlee0 Sangjin Lee
              Reporter:
              jlowe Jason Lowe
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development