Mahout
  1. Mahout
  2. MAHOUT-37

Tarball for Mahout-ified Taste code

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.1
    • Component/s: None
    • Labels:
      None
    • Environment:

      All

      Description

      I will attach to this issue a tarball containing my proposed contribution of the Taste project to Mahout. I think it's been scrubbed, repackaged, reformatted, cleaned up, and generally prepared for easy integration into Mahout. In particular I made many optional dependencies truly optional, so the core distro is quite lean.

      The MD5 hash of the tarball is db78114f9799905c7d4a5a1907a03e6f

        Activity

        Hide
        Sean Owen added a comment -

        MD5 is db78114f9799905c7d4a5a1907a03e6f

        Show
        Sean Owen added a comment - MD5 is db78114f9799905c7d4a5a1907a03e6f
        Hide
        Karl Wettin added a comment -

        Cool, I'll review in the week.

        Show
        Karl Wettin added a comment - Cool, I'll review in the week.
        Hide
        Grant Ingersoll added a comment -

        FYI, this can't be committed yet. I am working through the Software Grant with Sean. I will mark it when it is cleared by the Incubator.

        The MD5 hash of the tarball is db78114f9799905c7d4a5a1907a03e6f

        Sean, can you specify what tool and version was used to generate the MD5 hash?

        Thanks.

        Show
        Grant Ingersoll added a comment - FYI, this can't be committed yet. I am working through the Software Grant with Sean. I will mark it when it is cleared by the Incubator. The MD5 hash of the tarball is db78114f9799905c7d4a5a1907a03e6f Sean, can you specify what tool and version was used to generate the MD5 hash? Thanks.
        Hide
        Sean Owen added a comment -

        It's /sbin/md5 on Mac OS X 10.4.11, on a PowerPC G5. It doesn't report a version but ssh reports it's using OpenSSL 0.9.7l if that helps. Uh oh is it not matching? I could try on another machine.

        Show
        Sean Owen added a comment - It's /sbin/md5 on Mac OS X 10.4.11, on a PowerPC G5. It doesn't report a version but ssh reports it's using OpenSSL 0.9.7l if that helps. Uh oh is it not matching? I could try on another machine.
        Hide
        Grant Ingersoll added a comment -

        No, it's fine, it just needs to be documented for the software grant.

        Show
        Grant Ingersoll added a comment - No, it's fine, it just needs to be documented for the software grant.
        Hide
        Grant Ingersoll added a comment -

        Hey Sean, looks the grant is on file.

        Do you have the license for the EJB file? I think that just needs to go in the lib there too. Before a Mahout release, we will need to put together a NOTICES.txt, but we don't need that yet.

        I don't think that license is a showstopper at the moment.

        Show
        Grant Ingersoll added a comment - Hey Sean, looks the grant is on file. Do you have the license for the EJB file? I think that just needs to go in the lib there too. Before a Mahout release, we will need to put together a NOTICES.txt, but we don't need that yet. I don't think that license is a showstopper at the moment.
        Hide
        Sean Owen added a comment -

        http://java.sun.com/products/ejb/license/ejb-2_1-fr-spec-license.html

        ... is the license I could find for the EJB 2.1 spec. IANAL so can't be sure that I understand if there are any complications in these terms.

        As a layman, I am all but certain redistributing these API classes (this is all interface/spec, no implementation) is OK. I note that Tomcat redistributes the servlet and JSP API classes freely. I don't see additional license materials included in that distro.

        If there's concern about it... it's possible to separate out this code easily, and just toss it (not sure anyone is using EJBs anymore, really?) or put it in contrib with the proviso that it only compiles if you go find the EJB classes, but I suppose I would be surprised if that is deemed necessary, because then doesn't Tomcat have a problem? I am not sure if the fact that it itself implements those specs makes a difference, or whether there is some blanket agreement between Apache and Sun on this point, and whether it extends past Tomcat or what.

        Show
        Sean Owen added a comment - http://java.sun.com/products/ejb/license/ejb-2_1-fr-spec-license.html ... is the license I could find for the EJB 2.1 spec. IANAL so can't be sure that I understand if there are any complications in these terms. As a layman, I am all but certain redistributing these API classes (this is all interface/spec, no implementation) is OK. I note that Tomcat redistributes the servlet and JSP API classes freely. I don't see additional license materials included in that distro. If there's concern about it... it's possible to separate out this code easily, and just toss it (not sure anyone is using EJBs anymore, really?) or put it in contrib with the proviso that it only compiles if you go find the EJB classes, but I suppose I would be surprised if that is deemed necessary, because then doesn't Tomcat have a problem? I am not sure if the fact that it itself implements those specs makes a difference, or whether there is some blanket agreement between Apache and Sun on this point, and whether it extends past Tomcat or what.
        Hide
        Grant Ingersoll added a comment -

        I think we just need to figure it out by the time we release. I'm
        guessing it is ok, since it is just the API and I would think they
        would want to encourage adoption. Besides, the container provides the
        implementation, right? I personally don't have any interest in EJB,
        but others might, so I don't care it stays or goes.

        --------------------------
        Grant Ingersoll

        Lucene Helpful Hints:
        http://wiki.apache.org/lucene-java/BasicsOfPerformance
        http://wiki.apache.org/lucene-java/LuceneFAQ

        Show
        Grant Ingersoll added a comment - I think we just need to figure it out by the time we release. I'm guessing it is ok, since it is just the API and I would think they would want to encourage adoption. Besides, the container provides the implementation, right? I personally don't have any interest in EJB, but others might, so I don't care it stays or goes. -------------------------- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
        Hide
        Sean Owen added a comment -

        That's right, this is just the API, so I can compile an EJB against the interfaces, so it can be run in a container that provides the implementation. The .jar here is purely the interfaces that are part of the public spec.

        Grant you may know better than I whom to ask within Apache. I would hold up Tomcat as an example of an Apache project redistributing Sun's J2EE API .jars, and see if whatever reasoning or agreement covered that covers us.

        If there is any pushback, I am not too upset about removing the EJB API. It's EJB 2.x, versus EJB 3.x, and I am not clear if anyone uses it. It'd be shame to remove perfectly working code over licensing issues that likely aren't issues, but not a big shame.

        Show
        Sean Owen added a comment - That's right, this is just the API, so I can compile an EJB against the interfaces, so it can be run in a container that provides the implementation. The .jar here is purely the interfaces that are part of the public spec. Grant you may know better than I whom to ask within Apache. I would hold up Tomcat as an example of an Apache project redistributing Sun's J2EE API .jars, and see if whatever reasoning or agreement covered that covers us. If there is any pushback, I am not too upset about removing the EJB API. It's EJB 2.x, versus EJB 3.x, and I am not clear if anyone uses it. It'd be shame to remove perfectly working code over licensing issues that likely aren't issues, but not a big shame.
        Hide
        Grant Ingersoll added a comment -

        OK, looks like we are good to go! Whew. See http://mail-archives.apache.org/mod_mbox/incubator-general/200805.mbox/ajax/%3c9EBB5BAC-BAA9-4486-BBAF-FB071F075669@apache.org%3e

        Sean, I think you mentioned you have some initial parallelizations of Taste, so I guess I would suggest you just put them up as a patch either on this issue or a new issue and then we can commit them and start digging in deeper.

        Show
        Grant Ingersoll added a comment - OK, looks like we are good to go! Whew. See http://mail-archives.apache.org/mod_mbox/incubator-general/200805.mbox/ajax/%3c9EBB5BAC-BAA9-4486-BBAF-FB071F075669@apache.org%3e Sean, I think you mentioned you have some initial parallelizations of Taste, so I guess I would suggest you just put them up as a patch either on this issue or a new issue and then we can commit them and start digging in deeper.
        Hide
        Sean Owen added a comment -

        Rock n roll. Yes, I have some stuff in bits and pieces over here. Once the code is committed, I'll integrate it properly and refine it and post as a new issue / patch.

        Show
        Sean Owen added a comment - Rock n roll. Yes, I have some stuff in bits and pieces over here. Once the code is committed, I'll integrate it properly and refine it and post as a new issue / patch.
        Hide
        Sean Owen added a comment -

        Sorry, all y'all waiting on me to commit the tarball? happy to, just wondering if someone more experienced would like to take a look and put the pieces in the right places.

        Well I suppose there aren't that many questions. The main one is that I have my own build.xml file and build.properties file. I can integrate them into the main one, or leave it as a separate build file. Either works fine; I like more integration, though I have a lot of targets. I can rename them as "taste-build" etc. and hook them in to common "build" targets.

        My lib/ directory maps onto the existing one, as does my src/main and src/test dir. I have a src/examples – could I make a new root for that?

        Show
        Sean Owen added a comment - Sorry, all y'all waiting on me to commit the tarball? happy to, just wondering if someone more experienced would like to take a look and put the pieces in the right places. Well I suppose there aren't that many questions. The main one is that I have my own build.xml file and build.properties file. I can integrate them into the main one, or leave it as a separate build file. Either works fine; I like more integration, though I have a lot of targets. I can rename them as "taste-build" etc. and hook them in to common "build" targets. My lib/ directory maps onto the existing one, as does my src/main and src/test dir. I have a src/examples – could I make a new root for that?
        Hide
        Grant Ingersoll added a comment -

        I think you can go ahead and do it, I'm pretty swamped for the next
        week plus. As for builds, etc. I think ideally it's all just
        integrated in under core and uses that build. I think there is an
        examples section under core/src (or somewhere like that). I wonder if
        this will be the time that people want separate jars or not.

        I also think once this is in and naive bayes is in, we should start
        thinking about releasing 0.1.

        Show
        Grant Ingersoll added a comment - I think you can go ahead and do it, I'm pretty swamped for the next week plus. As for builds, etc. I think ideally it's all just integrated in under core and uses that build. I think there is an examples section under core/src (or somewhere like that). I wonder if this will be the time that people want separate jars or not. I also think once this is in and naive bayes is in, we should start thinking about releasing 0.1.
        Hide
        Sean Owen added a comment -

        OK, done! committed. I will follow up with a general e-mail to the list about some additional work that needs to be done to complete the integration, but, it's all in there and compiles and so on.

        Show
        Sean Owen added a comment - OK, done! committed. I will follow up with a general e-mail to the list about some additional work that needs to be done to complete the integration, but, it's all in there and compiles and so on.

          People

          • Assignee:
            Grant Ingersoll
            Reporter:
            Sean Owen
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development