Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-2023

Drivers broken, scopt classes not found

    Details

    • Type: Bug
    • Status: Open
    • Priority: Blocker
    • Resolution: Unresolved
    • Affects Version/s: 0.13.1
    • Fix Version/s: 0.13.1
    • Component/s: build
    • Labels:
      None
    • Environment:

      any

      Description

      Type `mahout spark-itemsimilarity` after Mahout is installed properly and you get a fatal exception due to missing scopt classes.

      Probably a build issue related to incorrect versions of scopt being looked for.

        Issue Links

          Activity

          Hide
          pferrel Pat Ferrel added a comment -

          Yep, the mahout...dependency-reduced.jar excludes anything with scala.compat.version in the name

          Show
          pferrel Pat Ferrel added a comment - Yep, the mahout...dependency-reduced.jar excludes anything with scala.compat.version in the name
          Hide
          pferrel Pat Ferrel added a comment -

          Whoa, that is a big clue I think. Everything without a scala.compat.version is included in the file mahout-spark_2.10-0.13.1-SNAPSHOT-dependency-reduced.jar or whatever is generated for the Scala version but none of the classes that use scala.compat.version to resolve the classname.

          Big clue but not sure where it leads Trevor Grant Any idea where to look from here?

          Show
          pferrel Pat Ferrel added a comment - Whoa, that is a big clue I think. Everything without a scala.compat.version is included in the file mahout-spark_2.10-0.13.1-SNAPSHOT-dependency-reduced.jar or whatever is generated for the Scala version but none of the classes that use scala.compat.version to resolve the classname. Big clue but not sure where it leads Trevor Grant Any idea where to look from here?
          Hide
          pferrel Pat Ferrel added a comment - - edited

          ok not that MAHOUT-2020 is resolved, I looked at the scopt issue and found:

          • all the correct scopt artifact exist in remote repos for all scala versions and they are being found by the mahout build.
          • the ids for artifact etc are correct as per ^^^
          • I checked all the tagged versions of Mahout back to 12.0. Not sure when the drivers stopped working but there has been no change to any reference to scopt in any POM. And since people have been using it and asking questions on the mailing list I will assume that up till the last build changes the drivers worked.
          • The vienna-cl and java to c bindings are in the assembly pom so these classes are getting to the Spark Executors.
          • I've checked compute-classpath.sh and the mahout script where changes were small and not relevant.
          • I've looked at the contents of the mahout*dependency-reduced.jar, which should have the things listed below and it does not, in only had guava, apache commons and fastutils. It is supposed to have:
          
            <dependencySets>
              <dependencySet>
                <unpack>true</unpack>
                <unpackOptions>
                <!-- MAHOUT-1126 -->
                <excludes>
                   <exclude>META-INF/LICENSE</exclude>
                </excludes>
                </unpackOptions>
                <scope>runtime</scope>
                <outputDirectory>/</outputDirectory>
                <useTransitiveFiltering>true</useTransitiveFiltering>
                <includes>
                  <!-- guava only included to get Preconditions in mahout-math and mahout-hdfs -->
                  <include>com.google.guava:guava</include>
                  <include>com.github.scopt_${scala.compat.version}</include>
                  <include>com.tdunning:t-digest</include>
                  <include>org.apache.commons:commons-math3</include>
                  <include>it.unimi.dsi:fastutil</include>
                  <include>org.apache.mahout:mahout-native-viennacl_${scala.compat.version}</include>
                  <include>org.apache.mahout:mahout-native-viennacl-omp_${scala.compat.version}</include>
                  <include>org.bytedeco:javacpp</include>
                </includes>
              </dependencySet>
              <test>
                  <another tag="attribute"/>
              </test>
          

          This all leads me to believe that something in the build no longer makes that dependency-reduced.jar available to the Java Driver code since those other libs in the assembly are probably all hadoop or Spark Executor code, not needed in the Mahout driver. This is likely to have been a side effect of the build refactoring

          Trevor Grant does "dependencies-reduced.jar" which contains Scopt get its scala.compat.version fixed? It seems like the jar is missing anything with scala.compat.version but this may be a red herring.

          Show
          pferrel Pat Ferrel added a comment - - edited ok not that MAHOUT-2020 is resolved, I looked at the scopt issue and found: all the correct scopt artifact exist in remote repos for all scala versions and they are being found by the mahout build. the ids for artifact etc are correct as per ^^^ I checked all the tagged versions of Mahout back to 12.0. Not sure when the drivers stopped working but there has been no change to any reference to scopt in any POM. And since people have been using it and asking questions on the mailing list I will assume that up till the last build changes the drivers worked. The vienna-cl and java to c bindings are in the assembly pom so these classes are getting to the Spark Executors. I've checked compute-classpath.sh and the mahout script where changes were small and not relevant. I've looked at the contents of the mahout*dependency-reduced.jar, which should have the things listed below and it does not, in only had guava, apache commons and fastutils. It is supposed to have: <dependencySets> <dependencySet> <unpack> true </unpack> <unpackOptions> <!-- MAHOUT-1126 --> <excludes> <exclude> META-INF/LICENSE </exclude> </excludes> </unpackOptions> <scope> runtime </scope> <outputDirectory> / </outputDirectory> <useTransitiveFiltering> true </useTransitiveFiltering> <includes> <!-- guava only included to get Preconditions in mahout-math and mahout-hdfs --> <include> com.google.guava:guava </include> <include> com.github.scopt_${scala.compat.version} </include> <include> com.tdunning:t-digest </include> <include> org.apache.commons:commons-math3 </include> <include> it.unimi.dsi:fastutil </include> <include> org.apache.mahout:mahout-native-viennacl_${scala.compat.version} </include> <include> org.apache.mahout:mahout-native-viennacl-omp_${scala.compat.version} </include> <include> org.bytedeco:javacpp </include> </includes> </dependencySet> <test> <another tag= "attribute" /> </test> This all leads me to believe that something in the build no longer makes that dependency-reduced.jar available to the Java Driver code since those other libs in the assembly are probably all hadoop or Spark Executor code, not needed in the Mahout driver. This is likely to have been a side effect of the build refactoring Trevor Grant does "dependencies-reduced.jar" which contains Scopt get its scala.compat.version fixed? It seems like the jar is missing anything with scala.compat.version but this may be a red herring.
          Hide
          pferrel Pat Ferrel added a comment -

          To be clear we have to fix MAHOUT-2020 before I want to touch this.

          Show
          pferrel Pat Ferrel added a comment - To be clear we have to fix MAHOUT-2020 before I want to touch this.
          Hide
          pferrel Pat Ferrel added a comment -

          I suspect the PR listed above is mistakenly attached to this ticket

          Show
          pferrel Pat Ferrel added a comment - I suspect the PR listed above is mistakenly attached to this ticket
          Hide
          pferrel Pat Ferrel added a comment -

          While I agree this is a bug, I don't understand the relevance of above PR or the referenced Jira.

          The suspicion of Scopt not having a scala 2.11 is wrong. Scopt is popular and under active development with many scala versions supported up to 2.12 and back as far as 2.9.3.

          There is a problem with the current build here: https://issues.apache.org/jira/browse/MAHOUT-2020 and until it is resolved this bug may be hard to fix since it involves modifying the poms like 2020 does.

          Show
          pferrel Pat Ferrel added a comment - While I agree this is a bug, I don't understand the relevance of above PR or the referenced Jira. The suspicion of Scopt not having a scala 2.11 is wrong. Scopt is popular and under active development with many scala versions supported up to 2.12 and back as far as 2.9.3. There is a problem with the current build here: https://issues.apache.org/jira/browse/MAHOUT-2020 and until it is resolved this bug may be hard to fix since it involves modifying the poms like 2020 does.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user rawkintrevo commented on the issue:

          https://github.com/apache/mahout/pull/346

          Hi @BruceKuiLiu thanks for the contribution and welcome to the community.

          Could you please join dev@mahout.apache.org and bring this up for discussion, or at least start JIRA issue where we can discuss.

          https://issues.apache.org/jira/projects/MAHOUT/issues/MAHOUT-2023?filter=allopenissues

          Show
          githubbot ASF GitHub Bot added a comment - Github user rawkintrevo commented on the issue: https://github.com/apache/mahout/pull/346 Hi @BruceKuiLiu thanks for the contribution and welcome to the community. Could you please join dev@mahout.apache.org and bring this up for discussion, or at least start JIRA issue where we can discuss. https://issues.apache.org/jira/projects/MAHOUT/issues/MAHOUT-2023?filter=allopenissues

            People

            • Assignee:
              pferrel Pat Ferrel
              Reporter:
              pferrel Pat Ferrel
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Development