Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None

      Activity

      Sean Mackrory created issue -
      Sean Mackrory made changes -
      Field Original Value New Value
      Attachment 0001-BIGTOP-1181.-Add-pyspark-to-spark-package.patch [ 12622986 ]
      Sean Mackrory made changes -
      Status Open [ 1 ] Patch Available [ 10002 ]
      Hide
      Sean Mackrory added a comment - - edited

      pyspark is a python shell for spark. A couple of quick examples that I tested:

      sc.parallelize([1,2,3]).sum()

      And assuming you have a dictionary at hdfs:///words:

      sc.textFile("/words").filter(lambda w: w.startswith("spar")).take(5)
      Show
      Sean Mackrory added a comment - - edited pyspark is a python shell for spark. A couple of quick examples that I tested: sc.parallelize([1,2,3]).sum() And assuming you have a dictionary at hdfs:///words: sc.textFile( "/words" ).filter(lambda w: w.startswith( "spar" )).take(5)
      Hide
      Roman Shaposhnik added a comment -

      This is useful functionality, but it is going to make spark package depend on python. The usual questions apply: what versions of python is it compatible with and what python libraries does it require.

      And it also reminds me: have we gotten around to rename spark -> spark-common ?

      Show
      Roman Shaposhnik added a comment - This is useful functionality, but it is going to make spark package depend on python. The usual questions apply: what versions of python is it compatible with and what python libraries does it require. And it also reminds me: have we gotten around to rename spark -> spark-common ?
      Hide
      Sean Mackrory added a comment -

      Good point. A quick attempt to run my tests on RHEL 5 didn't show any immediate problems incompatibilities with Python 2.4, but that server is screwed up in other ways and I didn't get the whole to complete - so I should do more testing there. Spark's docs state that "PySpark requires Python 2.6 or higher." (http://spark.incubator.apache.org/docs/0.8.0/python-programming-guide.html), so even if it works it isn't officially supported. I'll add python dependencies to the packages in my patch. Given the lack of Python 2.4 support, I think we should break this out into a separate package (so it's not required for every Spark install) and require "python26" on EL5 (available from EPEL). It's not ideal but probably the best we can do if we want to at least provide the option of pyspark.

      Show
      Sean Mackrory added a comment - Good point. A quick attempt to run my tests on RHEL 5 didn't show any immediate problems incompatibilities with Python 2.4, but that server is screwed up in other ways and I didn't get the whole to complete - so I should do more testing there. Spark's docs state that "PySpark requires Python 2.6 or higher." ( http://spark.incubator.apache.org/docs/0.8.0/python-programming-guide.html ), so even if it works it isn't officially supported. I'll add python dependencies to the packages in my patch. Given the lack of Python 2.4 support, I think we should break this out into a separate package (so it's not required for every Spark install) and require "python26" on EL5 (available from EPEL). It's not ideal but probably the best we can do if we want to at least provide the option of pyspark.
      Hide
      Sean Mackrory added a comment -

      An alternative to requiring python26 on EPEL 5 is just "Recommends"-ing it. That way the packages don't stand in the user's way if they choose to install Python 2.6 some other way, but personally I think EPEL is ubiquitous enough.

      Show
      Sean Mackrory added a comment - An alternative to requiring python26 on EPEL 5 is just "Recommends"-ing it. That way the packages don't stand in the user's way if they choose to install Python 2.6 some other way, but personally I think EPEL is ubiquitous enough.
      Hide
      Mark Grover added a comment -

      And, I am working on the rename. Done with code, just need to build it a little and post a patch. Stay tuned!

      Show
      Mark Grover added a comment - And, I am working on the rename. Done with code, just need to build it a little and post a patch. Stay tuned!
      Hide
      Sean Mackrory added a comment -

      So this is the same thing, but as a separate 'python-spark' package. It pulls in Python on all distros, but specifically Python 2.6 on RHEL 5. I've tested it on OpenSUSE, SLES 11, RHEL 5 and RHEL 6. That's a new dependency for Bigtop, I think, but short of packaging Python ourselves I think it's the only way to provide PySpark. As I said, EPEL is ubiquitous enough on RHEL 5, in my opinion. Anybody disagree?

      Show
      Sean Mackrory added a comment - So this is the same thing, but as a separate 'python-spark' package. It pulls in Python on all distros, but specifically Python 2.6 on RHEL 5. I've tested it on OpenSUSE, SLES 11, RHEL 5 and RHEL 6. That's a new dependency for Bigtop, I think, but short of packaging Python ourselves I think it's the only way to provide PySpark. As I said, EPEL is ubiquitous enough on RHEL 5, in my opinion. Anybody disagree?
      Sean Mackrory made changes -
      Hide
      Sean Mackrory added a comment -

      ...aaand I forgot about Fedora. Will need to tweak the SPEC file logic....

      Show
      Sean Mackrory added a comment - ...aaand I forgot about Fedora. Will need to tweak the SPEC file logic....
      Hide
      Mark Grover added a comment -

      Looking good to me. Yeah, once you fix the python detection logic to include fedora, etc., I will re-review

      Show
      Mark Grover added a comment - Looking good to me. Yeah, once you fix the python detection logic to include fedora, etc., I will re-review
      Hide
      Sean Mackrory added a comment -

      I hadn't tested on Debian / Ubuntu systems either (although apparently I'm still the first to do so - see BIGTOP-1185 ). Updated the patch to work on Debian machines.

      I'll see about including Mageia in the SPEC file logic too as we used to support that and most of Bigtop would probably still work. The problem is RHEL 5 seems to be the only RPM distro we've ever supported that doesn't make it easy for you to identify it, and it's the only one we need to identify!

      Show
      Sean Mackrory added a comment - I hadn't tested on Debian / Ubuntu systems either (although apparently I'm still the first to do so - see BIGTOP-1185 ). Updated the patch to work on Debian machines. I'll see about including Mageia in the SPEC file logic too as we used to support that and most of Bigtop would probably still work. The problem is RHEL 5 seems to be the only RPM distro we've ever supported that doesn't make it easy for you to identify it, and it's the only one we need to identify!
      Sean Mackrory made changes -
      Hide
      Roman Shaposhnik added a comment -

      We've talked about dropping CentOS5/RHEL5 completely in Bigtop 0.8.0. Perhaps we should proceed with that in mind (which remind me – I need to restart Bigtop 0.8.0 thread).

      Thoughts?

      Show
      Roman Shaposhnik added a comment - We've talked about dropping CentOS5/RHEL5 completely in Bigtop 0.8.0. Perhaps we should proceed with that in mind (which remind me – I need to restart Bigtop 0.8.0 thread). Thoughts?
      Hide
      Sean Mackrory added a comment -

      Well I won't deny that if we're going to drop RHEL 5 support anyway, doing it now would be very convenient timing for me! We'll need a very similar mechanism to this when we move to Hue 3, as it uses a version of Django that also dropped support for Python 2.4, and consequently stock RHEL 5. Exactly the same issue.

      Show
      Sean Mackrory added a comment - Well I won't deny that if we're going to drop RHEL 5 support anyway, doing it now would be very convenient timing for me! We'll need a very similar mechanism to this when we move to Hue 3, as it uses a version of Django that also dropped support for Python 2.4, and consequently stock RHEL 5. Exactly the same issue.
      Hide
      Sean Mackrory added a comment - - edited

      Attaching a patch that's rebased on top of the rename from spark -> spark-core and drops the RHEL 5-specific stuff (but leaves the basic mechanism for specifying a different Python in place). When we get our OS support for future versions nailed down, if we do decide to continue supporting RHEL 5 I'll add that back and macros for all the other RPM-based distros we plan to support.

      Show
      Sean Mackrory added a comment - - edited Attaching a patch that's rebased on top of the rename from spark -> spark-core and drops the RHEL 5-specific stuff (but leaves the basic mechanism for specifying a different Python in place). When we get our OS support for future versions nailed down, if we do decide to continue supporting RHEL 5 I'll add that back and macros for all the other RPM-based distros we plan to support.
      Sean Mackrory made changes -
      Hide
      Sean Mackrory added a comment -

      Also added the new package to the testing manifest...

      Show
      Sean Mackrory added a comment - Also added the new package to the testing manifest...
      Sean Mackrory made changes -
      Hide
      Mark Grover added a comment -

      Hey Sean, it would be easier for me to review if this were on reviewboard. Do you mind posting it for review there please? Thanks!

      Show
      Mark Grover added a comment - Hey Sean, it would be easier for me to review if this were on reviewboard. Do you mind posting it for review there please? Thanks!
      Show
      Sean Mackrory added a comment - https://reviews.apache.org/r/17151/
      Hide
      Mark Grover added a comment -

      Thanks, Sean. +1 from me.

      Show
      Mark Grover added a comment - Thanks, Sean. +1 from me.
      Hide
      Sean Mackrory added a comment -

      Committed and pushed.

      Show
      Sean Mackrory added a comment - Committed and pushed.
      Sean Mackrory made changes -
      Status Patch Available [ 10002 ] Resolved [ 5 ]
      Resolution Fixed [ 1 ]
      Hide
      Roman Shaposhnik added a comment -

      A belated +1 from me as well. Thanks a bunch, Sean!

      Show
      Roman Shaposhnik added a comment - A belated +1 from me as well. Thanks a bunch, Sean!
      Hide
      Konstantin Boudnik added a comment -

      Guys, could you please update "Fix version" field for this? I presume it will be 0.8.0, but not sure

      Show
      Konstantin Boudnik added a comment - Guys, could you please update "Fix version" field for this? I presume it will be 0.8.0, but not sure
      Bruno Mahé made changes -
      Fix Version/s 0.8.0 [ 12324841 ]
      Hide
      Bruno Mahé added a comment -

      Done

      Show
      Bruno Mahé added a comment - Done
      Bruno Mahé made changes -
      Status Resolved [ 5 ] Closed [ 6 ]
      Transition Time In Source Status Execution Times Last Executer Last Execution Date
      Open Open Patch Available Patch Available
      1m 47s 1 Sean Mackrory 14/Jan/14 21:38
      Patch Available Patch Available Resolved Resolved
      6d 22h 53m 1 Sean Mackrory 21/Jan/14 20:31
      Resolved Resolved Closed Closed
      42d 10h 52m 1 Bruno Mahé 05/Mar/14 07:23

        People

        • Assignee:
          Sean Mackrory
          Reporter:
          Sean Mackrory
        • Votes:
          0 Vote for this issue
          Watchers:
          5 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development