Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-3923

bootstrap_hdfs.sh does not copy correct jars to hdfs

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.7.2, 1.8.0
    • Component/s: None
    • Labels:
      None

      Description

      Trying to make the VFS classloading stuff work and it doesn't seem like ServiceLoader is finding any of the KeywordExecutable implementations.

      Best I can tell after looking into this, VFSClassLoader (created by AccumuloVFSClassLoader) has all of the jars listed as resources, but when ServiceLoader tries to find the META-INF/services definitions, it returns nothing, and thus we think the keyword must be a class name. Seems like a commons-vfs bug.

        Issue Links

          Activity

          Hide
          ctubbsii Christopher Tubbs added a comment -

          Not sure this issue should be "Critical". VFS classloader is an entirely optional feature which is not necessary for Accumulo to run.

          Show
          ctubbsii Christopher Tubbs added a comment - Not sure this issue should be "Critical". VFS classloader is an entirely optional feature which is not necessary for Accumulo to run.
          Hide
          elserj Josh Elser added a comment -

          Marked it as critical because it's thing we advertise to work. We need to address it in some shape/fashion, even if that is just "document that it's busted and warn people to not use it"

          Show
          elserj Josh Elser added a comment - Marked it as critical because it's thing we advertise to work. We need to address it in some shape/fashion, even if that is just "document that it's busted and warn people to not use it"
          Hide
          ctubbsii Christopher Tubbs added a comment -

          That's fine... but we kind of expect everything to work... not everything is critical when it doesn't. Your call. I just think people overuse critical.

          Show
          ctubbsii Christopher Tubbs added a comment - That's fine... but we kind of expect everything to work... not everything is critical when it doesn't. Your call. I just think people overuse critical.
          Hide
          dlmarion Dave Marion added a comment -

          Loading of resource files on the classpath is a bug in vfs 2.0, fixed in 2.1. I have been trying to get them to release vfs 2.1 for months. We replaced the vfs 2.0 jar with the 2.1 snapshot to get this capability. It works at runtime, but does not compile with 2.1 snapshot due to api differences.

          Show
          dlmarion Dave Marion added a comment - Loading of resource files on the classpath is a bug in vfs 2.0, fixed in 2.1. I have been trying to get them to release vfs 2.1 for months. We replaced the vfs 2.0 jar with the 2.1 snapshot to get this capability. It works at runtime, but does not compile with 2.1 snapshot due to api differences.
          Show
          dlmarion Dave Marion added a comment - https://issues.apache.org/jira/plugins/servlet/mobile#issue/VFS-500
          Hide
          elserj Josh Elser added a comment -

          Thanks for the link, Dave Marion. I did a quick search earlier but couldn't find anything. You may also be interested in the patch I put up on ACCUMULO-3783

          Show
          elserj Josh Elser added a comment - Thanks for the link, Dave Marion . I did a quick search earlier but couldn't find anything. You may also be interested in the patch I put up on ACCUMULO-3783
          Hide
          dlmarion Dave Marion added a comment -

          Ugh, went back and looked and I tried to start the conversation a year ago. I'll look at your patch on the other issue.

          Show
          dlmarion Dave Marion added a comment - Ugh, went back and looked and I tried to start the conversation a year ago. I'll look at your patch on the other issue.
          Hide
          elserj Josh Elser added a comment -

          For historical purposes, commons-vfs-2.1 did fix this issue for me. Not much we can do until they actually release it though.

          Show
          elserj Josh Elser added a comment - For historical purposes, commons-vfs-2.1 did fix this issue for me. Not much we can do until they actually release it though.
          Hide
          ctubbsii Christopher Tubbs added a comment -

          Do we anticipate bumping the dependency version in 1.7.1 if we can get this fixed, or just wait until 1.8.0?

          Show
          ctubbsii Christopher Tubbs added a comment - Do we anticipate bumping the dependency version in 1.7.1 if we can get this fixed, or just wait until 1.8.0?
          Hide
          dlmarion Dave Marion added a comment -

          My plan was to complete ACCUMULO-3470 once VFS 2.1 is released. I have to remove the hdfs vfs objects that are in Accumulo, bump the dependency in the pom, and fix imports. None of this is client facing, so we should be able to backport where appropriate and if people need it.

          Show
          dlmarion Dave Marion added a comment - My plan was to complete ACCUMULO-3470 once VFS 2.1 is released. I have to remove the hdfs vfs objects that are in Accumulo, bump the dependency in the pom, and fix imports. None of this is client facing, so we should be able to backport where appropriate and if people need it.
          Hide
          mdrob Mike Drob added a comment -

          Can we close this out now that Commons VFS 2.1 is actually released?

          Show
          mdrob Mike Drob added a comment - Can we close this out now that Commons VFS 2.1 is actually released?
          Hide
          elserj Josh Elser added a comment -

          Can we close this out now that Commons VFS 2.1 is actually released?

          Given my 2015/07/06 comment, it seems so. VFS-500 was just re-opened due to issues on IBM JDKs, but I asked for clarification if the fix doesn't actually work or if the tests for the fix just don't work.

          Either way, I suppose, we don't have a good multi-vendor-JDK compatibility/testing matrix, so it's probably fine to just close it out..

          Show
          elserj Josh Elser added a comment - Can we close this out now that Commons VFS 2.1 is actually released? Given my 2015/07/06 comment, it seems so. VFS-500 was just re-opened due to issues on IBM JDKs, but I asked for clarification if the fix doesn't actually work or if the tests for the fix just don't work. Either way, I suppose, we don't have a good multi-vendor-JDK compatibility/testing matrix, so it's probably fine to just close it out..
          Hide
          ctubbsii Christopher Tubbs added a comment -

          If anybody can confirm it works for at least OpenJDK / Oracle JDK after the bump to VFS 2.1, I'd be happy with closing this.

          Show
          ctubbsii Christopher Tubbs added a comment - If anybody can confirm it works for at least OpenJDK / Oracle JDK after the bump to VFS 2.1, I'd be happy with closing this.
          Hide
          mdrob Mike Drob added a comment -

          What does confirming entail? I ran all the VFS tests in accumulo-start with Oracle Java 8 and they passed.

          Show
          mdrob Mike Drob added a comment - What does confirming entail? I ran all the VFS tests in accumulo-start with Oracle Java 8 and they passed.
          Hide
          elserj Josh Elser added a comment -

          I ran all the VFS tests in accumulo-start with Oracle Java 8 and they passed

          Yeah, I don't think unit tests actually cover this.

          What does confirming entail?

          It would require setting up an installation using the VFS classloader and then verify that one of the classes annotated with the KeywordExecutable interface can be executed via accumulo <keyword>. E.g. accumulo jar <jarfile> or accumulo classpath

          Show
          elserj Josh Elser added a comment - I ran all the VFS tests in accumulo-start with Oracle Java 8 and they passed Yeah, I don't think unit tests actually cover this. What does confirming entail? It would require setting up an installation using the VFS classloader and then verify that one of the classes annotated with the KeywordExecutable interface can be executed via accumulo <keyword> . E.g. accumulo jar <jarfile> or accumulo classpath
          Hide
          mdrob Mike Drob added a comment -

          Can you point me to some more detailed instructions on this? I'm not very familiar with how the VFS stuff operates. I found a blog post from two years ago, but it doesn't cover anything about KeywordExecutable.

          Show
          mdrob Mike Drob added a comment - Can you point me to some more detailed instructions on this? I'm not very familiar with how the VFS stuff operates. I found a blog post from two years ago, but it doesn't cover anything about KeywordExecutable .
          Hide
          ctubbsii Christopher Tubbs added a comment -

          The accumulo-gc jar contains a KeywordExecutable to provide the "gc" keyword. If you follow the instructions in the blog post to put that jar in HDFS, and make sure it's not anywhere else in the classpath locally, you'd just need to run bin/accumulo and make sure "gc" is still in the list of keywords described in the usage/help.

          Show
          ctubbsii Christopher Tubbs added a comment - The accumulo-gc jar contains a KeywordExecutable to provide the "gc" keyword. If you follow the instructions in the blog post to put that jar in HDFS, and make sure it's not anywhere else in the classpath locally, you'd just need to run bin/accumulo and make sure "gc" is still in the list of keywords described in the usage/help.
          Hide
          elserj Josh Elser added a comment -

          I found a blog post from two years ago, but it doesn't cover anything about KeywordExecutable.

          KeywordExecutable is the Google AutoService stuff that Christopher Tubbs wired up for most (all?) of the `accumulo <foo>` commands. The interface defines the mapping from "<foo>" to some class "org.apache.accumulo.core.Foo" to run. IIRC, you don't need to know anything else about how it works. Just do the minimal installation using the local filesystem (accumulo-start.jar) and put everything else into hdfs (or any other VFS compatible filesystem) and running aforementioned command (accumulo classpath) should function. The problem was that ServiceLoader (used by Google's AutoService) wasn't finding the service definition files that were stored on the jars being accessed via VFS. Does that help?

          Show
          elserj Josh Elser added a comment - I found a blog post from two years ago, but it doesn't cover anything about KeywordExecutable. KeywordExecutable is the Google AutoService stuff that Christopher Tubbs wired up for most (all?) of the `accumulo <foo>` commands. The interface defines the mapping from "<foo>" to some class "org.apache.accumulo.core.Foo" to run. IIRC, you don't need to know anything else about how it works. Just do the minimal installation using the local filesystem (accumulo-start.jar) and put everything else into hdfs (or any other VFS compatible filesystem) and running aforementioned command ( accumulo classpath ) should function. The problem was that ServiceLoader (used by Google's AutoService) wasn't finding the service definition files that were stored on the jars being accessed via VFS. Does that help?
          Hide
          dlmarion Dave Marion added a comment -

          FWIW, bootstrap_hdfs.sh should help you get the environment set up correctly.

          Show
          dlmarion Dave Marion added a comment - FWIW, bootstrap_hdfs.sh should help you get the environment set up correctly.
          Hide
          b.eckenfels Bernd Eckenfels added a comment -

          Is there still (after VFS 2.1 release including some VFS-500 fixes) a problem with this?

          Show
          b.eckenfels Bernd Eckenfels added a comment - Is there still (after VFS 2.1 release including some VFS-500 fixes) a problem with this?
          Hide
          mdrob Mike Drob added a comment -

          Bernd Eckenfels - probably not, but I haven't had a chance to test it yet.

          Show
          mdrob Mike Drob added a comment - Bernd Eckenfels - probably not, but I haven't had a chance to test it yet.
          Hide
          mdrob Mike Drob added a comment -

          Tried testing this today, had a lot of trouble getting things working.

          I was using CDH5.7

          Some issue that I noticed:

          • It is not clear at all what general.vfs.classpaths value is supposed to be. A directory? A java classpath style wildcard? A regex for jars?
            • I tried this as a directory and it didn't seem to pick up the jars.
            • I tried this as a regex or single jar and the process became unresponsive.
          • I tested this by moving accumulo-shell jar to hdfs and then running bin/accumulo help and observing that it did not include the shell command.
          • When running accumulo classpath I got a Level 5: Mystery Classloader (someone probably added a classloader and didn't update the switch statement in org.apache.accumulo.start.classloader.vfs.AccumuloVFSClassLoader) VFS classpaths items are:. There were no classpath items here, but maybe the updates to the VFS loader slipped in something new. Are these App Context ClassLoaders?

          Lots of these probably need to be broken out into their own issues, but overall this is screaming like something that is too big to fix in 1.7.2. Assigning this to you, Dave Marion to triage as appropriate.

          Show
          mdrob Mike Drob added a comment - Tried testing this today, had a lot of trouble getting things working. I was using CDH5.7 Some issue that I noticed: It is not clear at all what general.vfs.classpaths value is supposed to be. A directory? A java classpath style wildcard? A regex for jars? I tried this as a directory and it didn't seem to pick up the jars. I tried this as a regex or single jar and the process became unresponsive. I tested this by moving accumulo-shell jar to hdfs and then running bin/accumulo help and observing that it did not include the shell command. When running accumulo classpath I got a Level 5: Mystery Classloader (someone probably added a classloader and didn't update the switch statement in org.apache.accumulo.start.classloader.vfs.AccumuloVFSClassLoader) VFS classpaths items are: . There were no classpath items here, but maybe the updates to the VFS loader slipped in something new. Are these App Context ClassLoaders? Lots of these probably need to be broken out into their own issues, but overall this is screaming like something that is too big to fix in 1.7.2. Assigning this to you, Dave Marion to triage as appropriate.
          Hide
          dlmarion Dave Marion added a comment - - edited

          It is not clear at all what general.vfs.classpaths value is supposed to be. A directory? A java classpath style wildcard? A regex for jars?

          From the "Running Accumulo From HDFS" section in the blog

          <property>
              <name>general.vfs.classpaths</name>
              <value>hdfs://localhost:8020/accumulo/system-classpath</value>
              <description>Configuration for a system level vfs classloader. Accumulo jars can be configured here and loaded out of HDFS.</description>
          </property>
          

          You should be able to do the following:
          1. Untar an accumulo distribution somewhere
          2. create the configuration files (bootstrap_config.sh)
          3. Make appropriate changes to accumulo-env.sh and accumulo-site.xml, to include the property above
          4. Run bootstrap_hdfs.sh, this will push most of the jars into the location specified in the general.vfs.classpaths property
          5. Do what Christopher suggested above

          Show
          dlmarion Dave Marion added a comment - - edited It is not clear at all what general.vfs.classpaths value is supposed to be. A directory? A java classpath style wildcard? A regex for jars? From the "Running Accumulo From HDFS" section in the blog <property> <name>general.vfs.classpaths</name> <value>hdfs://localhost:8020/accumulo/system-classpath</value> <description>Configuration for a system level vfs classloader. Accumulo jars can be configured here and loaded out of HDFS.</description> </property> You should be able to do the following: 1. Untar an accumulo distribution somewhere 2. create the configuration files (bootstrap_config.sh) 3. Make appropriate changes to accumulo-env.sh and accumulo-site.xml, to include the property above 4. Run bootstrap_hdfs.sh, this will push most of the jars into the location specified in the general.vfs.classpaths property 5. Do what Christopher suggested above
          Hide
          elserj Josh Elser added a comment -

          From the "Running Accumulo From HDFS" section in the blog

          I'll also vouch for Mike's confusion – I remember wondering the same exact thing when I filed this issue. Does that mean you provide a directory? A CSV of directories? If you can clarify here, we can update the docs which would be great.

          Show
          elserj Josh Elser added a comment - From the "Running Accumulo From HDFS" section in the blog I'll also vouch for Mike's confusion – I remember wondering the same exact thing when I filed this issue. Does that mean you provide a directory? A CSV of directories? If you can clarify here, we can update the docs which would be great.
          Hide
          dlmarion Dave Marion added a comment - - edited

          I have some changes locally to bootstrap_hdfs.sh so that it will support the following property values in general.vfs.classpaths:

          hdfs://host:port/accumulo/classpath/
          hdfs://host:port/accumulo/classpath/.*.jar
          hdfs://host:port/accumulo/classpath/.*.jar,hdfs://host:port/accumulo/classpath2/.*.jar

          In all cases, it will create the /accumulo/classpath if it does not exist and push the jars from the lib directory into the specified directory. I also found in my testing that the slf4j jars need to be kept on the local server. Additionally, when trying to run `accumulo help` to test these changes, I found that the client is in a deadlock situation. Ticket to follow.

          Show
          dlmarion Dave Marion added a comment - - edited I have some changes locally to bootstrap_hdfs.sh so that it will support the following property values in general.vfs.classpaths: hdfs://host:port/accumulo/classpath/ hdfs://host:port/accumulo/classpath/.*.jar hdfs://host:port/accumulo/classpath/.*.jar,hdfs://host:port/accumulo/classpath2/.*.jar In all cases, it will create the /accumulo/classpath if it does not exist and push the jars from the lib directory into the specified directory. I also found in my testing that the slf4j jars need to be kept on the local server. Additionally, when trying to run `accumulo help` to test these changes, I found that the client is in a deadlock situation. Ticket to follow.
          Hide
          elserj Josh Elser added a comment -

          I have some changes locally to bootstrap_hdfs.sh so that it will support the following property values in general.vfs.classpaths:

          Oh, is this property not also used by the Java ClassLoader? Maybe that was part of my confusion.

          Additionally, when trying to run `accumulo help` to test these changes, I found that the client is in a deadlock situation. Ticket to follow.

          Thanks for helping out, Dave.

          Show
          elserj Josh Elser added a comment - I have some changes locally to bootstrap_hdfs.sh so that it will support the following property values in general.vfs.classpaths: Oh, is this property not also used by the Java ClassLoader? Maybe that was part of my confusion. Additionally, when trying to run `accumulo help` to test these changes, I found that the client is in a deadlock situation. Ticket to follow. Thanks for helping out, Dave.
          Hide
          dlmarion Dave Marion added a comment -

          Oh, is this property not also used by the Java ClassLoader? Maybe that was part of my confusion.

          The property is used by both the bootstrap_hdfs.sh script and the VFS classloader. My changes to bootstrap_hdfs.sh are in the parsing of the property from accumulo-site.xml.

          Show
          dlmarion Dave Marion added a comment - Oh, is this property not also used by the Java ClassLoader? Maybe that was part of my confusion. The property is used by both the bootstrap_hdfs.sh script and the VFS classloader. My changes to bootstrap_hdfs.sh are in the parsing of the property from accumulo-site.xml.
          Hide
          dlmarion Dave Marion added a comment -

          I pushed my changes to the script.

          Show
          dlmarion Dave Marion added a comment - I pushed my changes to the script.
          Hide
          elserj Josh Elser added a comment -

          Gotcha. I was a little confused that you could change the accepted text for general.vfs.classpaths by the bootstrap_hdfs.sh script without also changing the VFS-based ClassLoader.

          Does that mean that the CL could already handle directories and regexs?

          Show
          elserj Josh Elser added a comment - Gotcha. I was a little confused that you could change the accepted text for general.vfs.classpaths by the bootstrap_hdfs.sh script without also changing the VFS-based ClassLoader. Does that mean that the CL could already handle directories and regexs?
          Hide
          dlmarion Dave Marion added a comment -

          Yes, it should.

          Show
          dlmarion Dave Marion added a comment - Yes, it should.
          Hide
          dlmarion Dave Marion added a comment -

          Based on my testing, I would say that the issue with VFS-500 is resolved. There is a new issue with the ServiceLoader (ACCUMULO-4341), but I believe the problems with this JIRA are resolved.

          Show
          dlmarion Dave Marion added a comment - Based on my testing, I would say that the issue with VFS-500 is resolved. There is a new issue with the ServiceLoader ( ACCUMULO-4341 ), but I believe the problems with this JIRA are resolved.
          Hide
          elserj Josh Elser added a comment -

          Closing per Dave's assessment.

          Show
          elserj Josh Elser added a comment - Closing per Dave's assessment.
          Hide
          mdrob Mike Drob added a comment -

          Dropping 1.7.2 fix version from this JIRA since the fix was only committed to 1.8

          Show
          mdrob Mike Drob added a comment - Dropping 1.7.2 fix version from this JIRA since the fix was only committed to 1.8
          Hide
          elserj Josh Elser added a comment -

          Dropping 1.7.2 fix version from this JIRA since the fix was only committed to 1.8

          Didn't commons-vfs-2.1 land on 1.7 as well though (which is what would have 'fixed' the issue)?

          the fix was only committed to 1.8

          Any reason you didn't commit your change to 1.7 as well, Dave Marion? (1.6 too even?)

          Show
          elserj Josh Elser added a comment - Dropping 1.7.2 fix version from this JIRA since the fix was only committed to 1.8 Didn't commons-vfs-2.1 land on 1.7 as well though (which is what would have 'fixed' the issue)? the fix was only committed to 1.8 Any reason you didn't commit your change to 1.7 as well, Dave Marion ? (1.6 too even?)
          Hide
          mdrob Mike Drob added a comment -

          Renamed this issue to reflect the fix made. ACCUMULO-4341 can be used to track the rest of the work related to the VFS classloader + KeywordExecutable

          Show
          mdrob Mike Drob added a comment - Renamed this issue to reflect the fix made. ACCUMULO-4341 can be used to track the rest of the work related to the VFS classloader + KeywordExecutable
          Hide
          dlmarion Dave Marion added a comment -

          Any reason you didn't commit your change to 1.7 as well, Dave Marion? (1.6 too even?)

          Oversight on my part. It can be cherry-picked back to any release that uses slf4j.

          Show
          dlmarion Dave Marion added a comment - Any reason you didn't commit your change to 1.7 as well, Dave Marion? (1.6 too even?) Oversight on my part. It can be cherry-picked back to any release that uses slf4j.
          Hide
          dlmarion Dave Marion added a comment -

          When did we move to slf4j?

          Show
          dlmarion Dave Marion added a comment - When did we move to slf4j?
          Hide
          elserj Josh Elser added a comment -

          When did we move to slf4j?

          1.7.0 iirc.

          Show
          elserj Josh Elser added a comment - When did we move to slf4j? 1.7.0 iirc.
          Hide
          dlmarion Dave Marion added a comment -

          I will cherry-pick this back to 1.7.2 and test it with ACCUMULO-4341 tomorrow.

          Show
          dlmarion Dave Marion added a comment - I will cherry-pick this back to 1.7.2 and test it with ACCUMULO-4341 tomorrow.
          Hide
          ctubbsii Christopher Tubbs added a comment -

          Did you get a chance to test this? Does it work? I see it's closed, but no follow-up comment and just wanted to verify.

          Show
          ctubbsii Christopher Tubbs added a comment - Did you get a chance to test this? Does it work? I see it's closed, but no follow-up comment and just wanted to verify.
          Hide
          dlmarion Dave Marion added a comment -

          Yes, with ACCUMULO-4341

          Show
          dlmarion Dave Marion added a comment - Yes, with ACCUMULO-4341

            People

            • Assignee:
              dlmarion Dave Marion
              Reporter:
              elserj Josh Elser
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m

                  Development