Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-4341

ServiceLoader deadlock with classes loaded from HDFS

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.7.0, 1.8.0
    • Fix Version/s: 1.7.2, 1.8.0
    • Component/s: client
    • Labels:
      None

      Description

      With Accumulo set up to use general.vfs.classpaths to load classes from HDFS, running `accumulo help` will hang.

      A jstack of the process shows the IPC Client thread at:

         java.lang.Thread.State: BLOCKED (on object monitor)
      	at java.lang.Class.forName0(Native Method)
      	at java.lang.Class.forName(Class.java:348)
      	at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2051)
      	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:91)
      	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
      	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
      	at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1086)
      	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:966)
      

      and the main thread at:

         java.lang.Thread.State: WAITING (on object monitor)
      	at java.lang.Object.wait(Native Method)
      	at java.lang.Object.wait(Object.java:502)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1454)
      	- locked <0x00000000f09a2898> (a org.apache.hadoop.ipc.Client$Call)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1399)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
      	at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
      	at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:497)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
      	at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
      	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1982)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1128)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1124)
      	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1124)
      	at org.apache.commons.vfs2.provider.hdfs.HdfsFileObject.doAttach(HdfsFileObject.java:85)
      	at org.apache.commons.vfs2.provider.AbstractFileObject.attach(AbstractFileObject.java:173)
      	- locked <0x00000000f57fd008> (a org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
      	at org.apache.commons.vfs2.provider.AbstractFileObject.getContent(AbstractFileObject.java:1236)
      	- locked <0x00000000f57fd008> (a org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
      	at org.apache.commons.vfs2.impl.VFSClassLoader.getPermissions(VFSClassLoader.java:300)
      	at java.security.SecureClassLoader.getProtectionDomain(SecureClassLoader.java:206)
      	- locked <0x00000000f5ad9138> (a java.util.HashMap)
      	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
      	at org.apache.commons.vfs2.impl.VFSClassLoader.defineClass(VFSClassLoader.java:226)
      	at org.apache.commons.vfs2.impl.VFSClassLoader.findClass(VFSClassLoader.java:180)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
      	- locked <0x00000000f5af3b88> (a org.apache.commons.vfs2.impl.VFSClassLoader)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
      	- locked <0x00000000f6f5c2f8> (a org.apache.commons.vfs2.impl.VFSClassLoader)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
      	at java.lang.Class.forName0(Native Method)
      	at java.lang.Class.forName(Class.java:348)
      	at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
      	at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
      	at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
      	at org.apache.accumulo.start.Main.checkDuplicates(Main.java:196)
      	at org.apache.accumulo.start.Main.getExecutables(Main.java:188)
      	at org.apache.accumulo.start.Main.main(Main.java:52)
      

        Issue Links

          Activity

          Hide
          dlmarion Dave Marion added a comment -

          Backported ACCUMULO-3923 to 1.7.2-SNAPSHOT and applied this fix to the same branch. Tested using the following process:

          1. tar zxf accumulo-1.7.2-SNAPSHOT-bin.tar.gz
          2. cd accumulo-1.7.2-SNAPSHOT/bin
          3. ./build_native_library.sh
          4. ./bootstrap_config.sh
          5. Update accumulo-env.sh
          6. In accumulo-site.xml, set instance.volumes and add:
          
            <property>
              <name>general.vfs.classpaths</name>
              <value>hdfs://<host>:<port>/accumulo-1.7.2-SNAPSHOT-system-classpath/.*.jar</value>
            </property>
          
          7. ./bootstrap_hdfs.sh
          8. accumulo init
          9. start-all.sh
          
          Show
          dlmarion Dave Marion added a comment - Backported ACCUMULO-3923 to 1.7.2-SNAPSHOT and applied this fix to the same branch. Tested using the following process: 1. tar zxf accumulo-1.7.2-SNAPSHOT-bin.tar.gz 2. cd accumulo-1.7.2-SNAPSHOT/bin 3. ./build_native_library.sh 4. ./bootstrap_config.sh 5. Update accumulo-env.sh 6. In accumulo-site.xml, set instance.volumes and add: <property> <name>general.vfs.classpaths</name> <value>hdfs://<host>:<port>/accumulo-1.7.2-SNAPSHOT-system-classpath/.*.jar</value> </property> 7. ./bootstrap_hdfs.sh 8. accumulo init 9. start-all.sh
          Hide
          dlmarion Dave Marion added a comment -

          I think I have this resolved. It's all working now locally, to include running bootstrap_hdfs.sh and running Accumulo with the jars out of HDFS. I'm going to put up a small patch against 1.8.0. I didn't have time to test against 1.7. If someone gets to it before I do, feel free to apply the patch and close my PR.

          Show
          dlmarion Dave Marion added a comment - I think I have this resolved. It's all working now locally, to include running bootstrap_hdfs.sh and running Accumulo with the jars out of HDFS. I'm going to put up a small patch against 1.8.0. I didn't have time to test against 1.7. If someone gets to it before I do, feel free to apply the patch and close my PR.
          Hide
          dlmarion Dave Marion added a comment -

          I believe the VFS code, without the ServiceLoader, will work. It works in 1.6.

          Show
          dlmarion Dave Marion added a comment - I believe the VFS code, without the ServiceLoader, will work. It works in 1.6.
          Hide
          mdrob Mike Drob added a comment -

          Could we consider reverting some of the VFS changes? Do we believe that the VFS code in 1.7.1 would have worked?

          Show
          mdrob Mike Drob added a comment - Could we consider reverting some of the VFS changes? Do we believe that the VFS code in 1.7.1 would have worked?
          Hide
          dlmarion Dave Marion added a comment -

          I'm wondering if this is really due to a change in the ClassLoader object between Java 6 and 7. If you look at the javadoc for the two-arg form of CLassLoader.loadClass(), Java 7 introduced a lock and parallel capable class loaders.

          Show
          dlmarion Dave Marion added a comment - I'm wondering if this is really due to a change in the ClassLoader object between Java 6 and 7. If you look at the javadoc for the two-arg form of CLassLoader.loadClass(), Java 7 introduced a lock and parallel capable class loaders.
          Hide
          elserj Josh Elser added a comment -

          Mike Drob made this a blocker against 1.7.2 until we figure out what we're going to do (doc it or fix it).

          Show
          elserj Josh Elser added a comment - Mike Drob made this a blocker against 1.7.2 until we figure out what we're going to do (doc it or fix it).
          Hide
          elserj Josh Elser added a comment -

          I didn't realize it until now, but I am loading jars out of HDFS with 1.7.0 with the ServiceLoader. In this case, where it is working, I am only using putting my application jars into HDFS and setting the context name on tables. Pushing the accumulo jars into HDFS likely the case where it will not work.

          Ok, that's a reasonable workaround (easily doc'ed anyways). Will defer to Christopher about fixing the ServiceLoader stuff.

          Show
          elserj Josh Elser added a comment - I didn't realize it until now, but I am loading jars out of HDFS with 1.7.0 with the ServiceLoader. In this case, where it is working, I am only using putting my application jars into HDFS and setting the context name on tables. Pushing the accumulo jars into HDFS likely the case where it will not work. Ok, that's a reasonable workaround (easily doc'ed anyways). Will defer to Christopher about fixing the ServiceLoader stuff.
          Hide
          dlmarion Dave Marion added a comment -

          Loading classes out of HDFS has been a feature supported from 1.5 with the new classloader. The integration of the ServiceLoader was introduced in 1.7.0.

          I didn't realize it until now, but I am loading jars out of HDFS with 1.7.0 with the ServiceLoader. In this case, where it is working, I am only using putting my application jars into HDFS and setting the context name on tables. Pushing the accumulo jars into HDFS likely the case where it will not work.

          Show
          dlmarion Dave Marion added a comment - Loading classes out of HDFS has been a feature supported from 1.5 with the new classloader. The integration of the ServiceLoader was introduced in 1.7.0. I didn't realize it until now, but I am loading jars out of HDFS with 1.7.0 with the ServiceLoader. In this case, where it is working, I am only using putting my application jars into HDFS and setting the context name on tables. Pushing the accumulo jars into HDFS likely the case where it will not work.
          Hide
          elserj Josh Elser added a comment -

          Christopher Tubbs, FYI.

          If we don't figure out what is broken here before 1.8.0, I'm going to recommend that we need to very publicly state that HDFS classloading is not a supported feature for 1.8.0. I really don't want to send someone down a rabbit hole to find out that this doesn't work.

          Show
          elserj Josh Elser added a comment - Christopher Tubbs , FYI. If we don't figure out what is broken here before 1.8.0, I'm going to recommend that we need to very publicly state that HDFS classloading is not a supported feature for 1.8.0. I really don't want to send someone down a rabbit hole to find out that this doesn't work.
          Hide
          elserj Josh Elser added a comment -

          No, the service loader feature is busted.

          The service loader feature is used to start all Accumulo processes...

          Show
          elserj Josh Elser added a comment - No, the service loader feature is busted. The service loader feature is used to start all Accumulo processes...
          Hide
          dlmarion Dave Marion added a comment -

          I can't even `accumulo init` with the jars in HDFS.

          Show
          dlmarion Dave Marion added a comment - I can't even `accumulo init` with the jars in HDFS.
          Hide
          dlmarion Dave Marion added a comment -

          No, the service loader feature is busted.

          Show
          dlmarion Dave Marion added a comment - No, the service loader feature is busted.
          Hide
          elserj Josh Elser added a comment -

          So, VFS classloading from HDFS is still busted? That's the takeaway I see here...

          Show
          elserj Josh Elser added a comment - So, VFS classloading from HDFS is still busted? That's the takeaway I see here...

            People

            • Assignee:
              dlmarion Dave Marion
              Reporter:
              dlmarion Dave Marion
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h
                1h

                  Development