Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0.0
    • Component/s: spark
    • Labels:
      None

      Description

      Apache Spark package includes shaded assembly jar but there are missing files for Spark's built-on modules.

      1. BIGTOP-1640.1.patch
        7 kB
        YoungWoo Kim
      2. BIGTOP-1640.2.patch
        7 kB
        YoungWoo Kim
      3. BIGTOP-1640.3.patch
        7 kB
        YoungWoo Kim

        Activity

        Hide
        warwithin YoungWoo Kim added a comment - - edited

        jay vyas Hmm, right now the package's name is 'spark-core'. Spark's shaded jar includes its core and modules built-on spark.

        Should be Modularized into subpackages? or just add missing files?

        Show
        warwithin YoungWoo Kim added a comment - - edited jay vyas Hmm, right now the package's name is 'spark-core'. Spark's shaded jar includes its core and modules built-on spark. Should be Modularized into subpackages? or just add missing files?
        Hide
        jayunit100 jay vyas added a comment - - edited

        Hi my old buddy YoungWoo Kim !

        I can dig some on this now... in the meantime , let me know some more context if you get a chance.

        • What files are missing? and
        • what smoke test would you suggest to confirm that the missing stuff is correctly packaged ?
        Show
        jayunit100 jay vyas added a comment - - edited Hi my old buddy YoungWoo Kim ! I can dig some on this now... in the meantime , let me know some more context if you get a chance. What files are missing? and what smoke test would you suggest to confirm that the missing stuff is correctly packaged ?
        Hide
        jayunit100 jay vyas added a comment -

        id prefer just add missing files, no need for subpackages imo.

        Show
        jayunit100 jay vyas added a comment - id prefer just add missing files, no need for subpackages imo.
        Hide
        warwithin YoungWoo Kim added a comment - - edited

        First patch:

        • Add bin/ {run-example,spark-sql,beeline}
        • Move jars into lib/ for convention
        • Move examples to DOC_DIR

        Running tests now:

        # cd $SPARK_HOME
        # ./bin/run-example SparkPi 10
        15/02/02 17:56:42 INFO spark.SecurityManager: Changing view acls to: root,
        15/02/02 17:56:42 INFO spark.SecurityManager: Changing modify acls to: root,
        15/02/02 17:56:42 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, ); users with modify permissions: Set(root, )
        15/02/02 17:56:43 INFO slf4j.Slf4jLogger: Slf4jLogger started
        15/02/02 17:56:43 INFO Remoting: Starting remoting
        ...
        Pi is roughly 3.139964
        ...
        15/02/02 17:56:46 INFO spark.SparkContext: Successfully stopped SparkContext
        
        
        # ./bin/spark-submit /usr/share/doc/spark-1.1.0/examples/src/main/python/pi.py 10
        15/02/02 18:01:09 INFO spark.SecurityManager: Changing view acls to: root,
        15/02/02 18:01:09 INFO spark.SecurityManager: Changing modify acls to: root,
        15/02/02 18:01:09 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, ); users with modify permissions: Set(root, )
        15/02/02 18:01:10 INFO slf4j.Slf4jLogger: Slf4jLogger started
        15/02/02 18:01:10 INFO Remoting: Starting remoting
        ...
        Pi is roughly 3.141824
        ...
        15/02/02 18:01:15 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
        
        

        I tried to follow the structure of official Spark's binary distribution and sync with official documentations. 'bin/run-example' should work for newbie on Spark

        Show
        warwithin YoungWoo Kim added a comment - - edited First patch: Add bin/ {run-example,spark-sql,beeline} Move jars into lib/ for convention Move examples to DOC_DIR Running tests now: # cd $SPARK_HOME # ./bin/run-example SparkPi 10 15/02/02 17:56:42 INFO spark.SecurityManager: Changing view acls to: root, 15/02/02 17:56:42 INFO spark.SecurityManager: Changing modify acls to: root, 15/02/02 17:56:42 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, ); users with modify permissions: Set(root, ) 15/02/02 17:56:43 INFO slf4j.Slf4jLogger: Slf4jLogger started 15/02/02 17:56:43 INFO Remoting: Starting remoting ... Pi is roughly 3.139964 ... 15/02/02 17:56:46 INFO spark.SparkContext: Successfully stopped SparkContext # ./bin/spark-submit /usr/share/doc/spark-1.1.0/examples/src/main/python/pi.py 10 15/02/02 18:01:09 INFO spark.SecurityManager: Changing view acls to: root, 15/02/02 18:01:09 INFO spark.SecurityManager: Changing modify acls to: root, 15/02/02 18:01:09 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, ); users with modify permissions: Set(root, ) 15/02/02 18:01:10 INFO slf4j.Slf4jLogger: Slf4jLogger started 15/02/02 18:01:10 INFO Remoting: Starting remoting ... Pi is roughly 3.141824 ... 15/02/02 18:01:15 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down. I tried to follow the structure of official Spark's binary distribution and sync with official documentations. 'bin/run-example' should work for newbie on Spark
        Hide
        jayunit100 jay vyas added a comment - - edited

        looks great and ! like the wildcard approach. can you also update the rpm packages to include those wildcard dirs ? Then I think this important update is complete !

        Show
        jayunit100 jay vyas added a comment - - edited looks great and ! like the wildcard approach. can you also update the rpm packages to include those wildcard dirs ? Then I think this important update is complete !
        Hide
        warwithin YoungWoo Kim added a comment -

        Off topic!

        Spark's packaging is strange to me. I don't know exactly but due to the YARN enabled assembly jar, spark-core has jars from modules built on Spark such as Spark SQL, MLlib, GraphX. Streaming and Bagel.

        Is anyone using 'Spark External for Spark Streaming', https://github.com/apache/spark/tree/branch-1.1/external ?? Is it good to have this in Bigtop?

        Show
        warwithin YoungWoo Kim added a comment - Off topic! Spark's packaging is strange to me. I don't know exactly but due to the YARN enabled assembly jar, spark-core has jars from modules built on Spark such as Spark SQL, MLlib, GraphX. Streaming and Bagel. Is anyone using 'Spark External for Spark Streaming', https://github.com/apache/spark/tree/branch-1.1/external ?? Is it good to have this in Bigtop?
        Hide
        warwithin YoungWoo Kim added a comment -

        Updated patch:

        Show
        warwithin YoungWoo Kim added a comment - Updated patch: addressed jay vyas 's comment.
        Hide
        jayunit100 jay vyas added a comment -

        +1 ... looks good to me now.... assuming this has been tested on deb, and so rpm version should also work now that the changes look equivalent.

        I dont have my apache creds with me, am on PTO, but maybe some other kind soul can push for me

        Show
        jayunit100 jay vyas added a comment - +1 ... looks good to me now.... assuming this has been tested on deb, and so rpm version should also work now that the changes look equivalent. I dont have my apache creds with me, am on PTO, but maybe some other kind soul can push for me
        Hide
        evans_ye Evans Ye added a comment -

        Hey I'm glad to help on committing. But I did a test build on spark and it failed:

        $ ./gradlew spark-clean spark-rpm
        ...
        [INFO] Spark Project External ZeroMQ ..................... SUCCESS [53.036s]
        [INFO] Spark Project External MQTT ....................... SUCCESS [1:00.006s]
        [INFO] Spark Project Examples ............................ SUCCESS [3:11.147s]
        [INFO] ------------------------------------------------------------------------
        [INFO] BUILD SUCCESS
        [INFO] ------------------------------------------------------------------------
        [INFO] Total time: 57:09.235s
        [INFO] Finished at: Mon Feb 02 10:52:07 EST 2015
        [INFO] Final Memory: 103M/767M
        [INFO] ------------------------------------------------------------------------
        + exit 0
        Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.kI1DmT
        + umask 022
        + cd /host/bigtop-commit/build/spark/rpm//BUILD
        + '[' /host/bigtop-commit/build/spark/rpm/BUILDROOT/spark-core-1.1.0-1.el6.x86_64 '!=' / ']'
        + rm -rf /host/bigtop-commit/build/spark/rpm/BUILDROOT/spark-core-1.1.0-1.el6.x86_64
        ++ dirname /host/bigtop-commit/build/spark/rpm/BUILDROOT/spark-core-1.1.0-1.el6.x86_64
        + mkdir -p /host/bigtop-commit/build/spark/rpm/BUILDROOT
        + mkdir /host/bigtop-commit/build/spark/rpm/BUILDROOT/spark-core-1.1.0-1.el6.x86_64
        + cd spark-1.1.0
        + LANG=C
        + export LANG
        + unset DISPLAY
        + /bin/rm -rf /host/bigtop-commit/build/spark/rpm/BUILDROOT/spark-core-1.1.0-1.el6.x86_64
        + /usr/bin/install -d -m 0755 /host/bigtop-commit/build/spark/rpm/BUILDROOT/spark-core-1.1.0-1.el6.x86_64//etc/rc.d/init.d/
        ++ pwd
        + bash /host/bigtop-commit/build/spark/rpm//SOURCES/install_spark.sh --build-dir=/host/bigtop-commit/build/spark/rpm/BUILD/spark-1.1.0 --source-dir=/host/bigtop-commit/build/spark/rpm//SOURCES --prefix=/host/bigtop-commit/build/spark/rpm/BUILDROOT/spark-core-1.1.0-1.el6.x86_64 --doc-dir=/usr/share/doc/spark-1.1.0 --pyspark-python=python
        tar: datanucleus*: Not found in archive
        tar: Exiting with failure status due to previous errors
        
        
        RPM build errors:
        error: Bad exit status from /var/tmp/rpm-tmp.kI1DmT (%install)
            Bad exit status from /var/tmp/rpm-tmp.kI1DmT (%install)
        :spark-rpm FAILED
        

        Most of the build steps are succeeded. It looks like that might just be a tar error causing the failure.
        YoungWoo Kim would you mind to take a look? And feel free to ping me if you need a commit.

        Show
        evans_ye Evans Ye added a comment - Hey I'm glad to help on committing. But I did a test build on spark and it failed: $ ./gradlew spark-clean spark-rpm ... [INFO] Spark Project External ZeroMQ ..................... SUCCESS [53.036s] [INFO] Spark Project External MQTT ....................... SUCCESS [1:00.006s] [INFO] Spark Project Examples ............................ SUCCESS [3:11.147s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 57:09.235s [INFO] Finished at: Mon Feb 02 10:52:07 EST 2015 [INFO] Final Memory: 103M/767M [INFO] ------------------------------------------------------------------------ + exit 0 Executing(%install): /bin/sh -e / var /tmp/rpm-tmp.kI1DmT + umask 022 + cd /host/bigtop-commit/build/spark/rpm //BUILD + '[' /host/bigtop-commit/build/spark/rpm/BUILDROOT/spark-core-1.1.0-1.el6.x86_64 '!=' / ']' + rm -rf /host/bigtop-commit/build/spark/rpm/BUILDROOT/spark-core-1.1.0-1.el6.x86_64 ++ dirname /host/bigtop-commit/build/spark/rpm/BUILDROOT/spark-core-1.1.0-1.el6.x86_64 + mkdir -p /host/bigtop-commit/build/spark/rpm/BUILDROOT + mkdir /host/bigtop-commit/build/spark/rpm/BUILDROOT/spark-core-1.1.0-1.el6.x86_64 + cd spark-1.1.0 + LANG=C + export LANG + unset DISPLAY + /bin/rm -rf /host/bigtop-commit/build/spark/rpm/BUILDROOT/spark-core-1.1.0-1.el6.x86_64 + /usr/bin/install -d -m 0755 /host/bigtop-commit/build/spark/rpm/BUILDROOT/spark-core-1.1.0-1.el6.x86_64 //etc/rc.d/init.d/ ++ pwd + bash /host/bigtop-commit/build/spark/rpm //SOURCES/install_spark.sh --build-dir=/host/bigtop-commit/build/spark/rpm/BUILD/spark-1.1.0 --source-dir=/host/bigtop-commit/build/spark/rpm//SOURCES --prefix=/host/bigtop-commit/build/spark/rpm/BUILDROOT/spark-core-1.1.0-1.el6.x86_64 --doc-dir=/usr/share/doc/spark-1.1.0 --pyspark-python=python tar: datanucleus*: Not found in archive tar: Exiting with failure status due to previous errors RPM build errors: error: Bad exit status from / var /tmp/rpm-tmp.kI1DmT (%install) Bad exit status from / var /tmp/rpm-tmp.kI1DmT (%install) :spark-rpm FAILED Most of the build steps are succeeded. It looks like that might just be a tar error causing the failure. YoungWoo Kim would you mind to take a look? And feel free to ping me if you need a commit.
        Hide
        warwithin YoungWoo Kim added a comment -

        My bad! Evans Ye I'll update the patch.

        Show
        warwithin YoungWoo Kim added a comment - My bad! Evans Ye I'll update the patch.
        Hide
        warwithin YoungWoo Kim added a comment -

        Updated patch (BIGTOP-1640.3.patch):

        • fix packaging failure

        Evans Ye Ping!

        Show
        warwithin YoungWoo Kim added a comment - Updated patch ( BIGTOP-1640 .3.patch): fix packaging failure Evans Ye Ping!
        Hide
        evans_ye Evans Ye added a comment - - edited

        Tested and the patch 3 works.
        I fixed a trailing new line warning in bigtop-packages/src/deb/spark/spark-core.install, and committed.
        Thanks YoungWoo Kim!

        Show
        evans_ye Evans Ye added a comment - - edited Tested and the patch 3 works. I fixed a trailing new line warning in bigtop-packages/src/deb/spark/spark-core.install , and committed. Thanks YoungWoo Kim !
        Hide
        warwithin YoungWoo Kim added a comment -

        jay vyas, Evans Ye Thanks for reviewing and committing this!

        Show
        warwithin YoungWoo Kim added a comment - jay vyas , Evans Ye Thanks for reviewing and committing this!
        Hide
        jayunit100 jay vyas added a comment -

        hi guys, awesome that this is now fixed... sorry im travelling so may have missed something, but i don't see a +1 here ? So here's my retroactive +1 Evans Ye in general make sure to leave your +1 on something when we commit it

        thanks guys !

        Show
        jayunit100 jay vyas added a comment - hi guys, awesome that this is now fixed... sorry im travelling so may have missed something, but i don't see a +1 here ? So here's my retroactive +1 Evans Ye in general make sure to leave your +1 on something when we commit it thanks guys !
        Hide
        jayunit100 jay vyas added a comment -

        PS Comparing the patches, looks like the issue was simple adding wildcars * before the components?

        Show
        jayunit100 jay vyas added a comment - PS Comparing the patches, looks like the issue was simple adding wildcars * before the components?
        Hide
        evans_ye Evans Ye added a comment -

        Auh, my bad, I'll do that next time.

        Show
        evans_ye Evans Ye added a comment - Auh, my bad, I'll do that next time.
        Hide
        mgrover Mark Grover added a comment -

        Sorry for jumping in little late. Looks good but overall, I did notice one thing.

        We are sed'ing run-examples in install_spark.sh I don't think it's a good idea to that long term. In fact, I'd vouch for our spark layout to be more in line with the upstream layout. That would eliminate the need for us to use our forked spark-classpath.sh and also allow us to movie away from sed'ing specific things in the binary scripts (e.g. run-examples).

        Show
        mgrover Mark Grover added a comment - Sorry for jumping in little late. Looks good but overall, I did notice one thing. We are sed'ing run-examples in install_spark.sh I don't think it's a good idea to that long term. In fact, I'd vouch for our spark layout to be more in line with the upstream layout. That would eliminate the need for us to use our forked spark-classpath.sh and also allow us to movie away from sed'ing specific things in the binary scripts (e.g. run-examples).
        Hide
        jayunit100 jay vyas added a comment -

        good catch ; TBH, I wasn't sure what the sed was for. just noticed that it worked when i built it etc... But Im all for cleaning it up some more.

        Show
        jayunit100 jay vyas added a comment - good catch ; TBH, I wasn't sure what the sed was for. just noticed that it worked when i built it etc... But Im all for cleaning it up some more.
        Hide
        warwithin YoungWoo Kim added a comment -

        Mark Grover Thanks for comment.

        You are right Current sed'ing is a workaround. Spark build using maven profile '-Pbigtop' does not make a same distribution as its binary distribution. Anyway, I'll file a jira for cleaning up.

        Show
        warwithin YoungWoo Kim added a comment - Mark Grover Thanks for comment. You are right Current sed'ing is a workaround. Spark build using maven profile '-Pbigtop' does not make a same distribution as its binary distribution. Anyway, I'll file a jira for cleaning up.
        Hide
        warwithin YoungWoo Kim added a comment -

        Filed BIGTOP-1667.

        Show
        warwithin YoungWoo Kim added a comment - Filed BIGTOP-1667 .

          People

          • Assignee:
            warwithin YoungWoo Kim
            Reporter:
            warwithin YoungWoo Kim
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development