Mahout
  1. Mahout
  2. MAHOUT-811

Mahout examples try to write to examples/bin/work, which may not be writeable by current user

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5
    • Fix Version/s: 0.6
    • Component/s: Examples
    • Labels:
      None

      Description

      The examples in examples/bin create subdirectories (either work or mahout-work) in that directory and write to those subdirectories. This works fine if the current user has write access to examples/bin, but if not (such as in the case of the package generated by Bigtop, in which the files are installed to /usr/lib/mahout and owned by root), the examples can't run. This is causing BIGTOP-96, but needs to be fixed in Mahout. The patch I'm attaching changes all the references to work, examples/bin/work, and mahout-work to instead use /tmp/mahout-work-$

      {USER}

      , which will be writeable.

        Activity

        Hide
        Sean Owen added a comment -

        Sounds great to me, thanks. At the least this centralizes the definition of 'work' dir, and I see no reason it can't live in temp. In fact it sounds like it should.

        Show
        Sean Owen added a comment - Sounds great to me, thanks. At the least this centralizes the definition of 'work' dir, and I see no reason it can't live in temp. In fact it sounds like it should.
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1038 (See https://builds.apache.org/job/Mahout-Quality/1038/)
        MAHOUT-811 move work dir to /tmp

        srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1170702
        Files :

        • /mahout/trunk/examples/bin/build-20news-bayes.sh
        • /mahout/trunk/examples/bin/build-cluster-syntheticcontrol.sh
        • /mahout/trunk/examples/bin/build-reuters.sh
        • /mahout/trunk/examples/bin/factorize-movielens-1M.sh
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1038 (See https://builds.apache.org/job/Mahout-Quality/1038/ ) MAHOUT-811 move work dir to /tmp srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1170702 Files : /mahout/trunk/examples/bin/build-20news-bayes.sh /mahout/trunk/examples/bin/build-cluster-syntheticcontrol.sh /mahout/trunk/examples/bin/build-reuters.sh /mahout/trunk/examples/bin/factorize-movielens-1M.sh
        Hide
        Drew Farris added a comment -

        This patch introduces another problem, specifically with the following line:

        cd ${WORK_DIR}/reuters-sgm && tar xzf ../reuters21578.tar.gz && cd .. && cd ..
        

        Here the script is making assumptions as to where the $

        {WORK_DIR}

        is located, and cd-ing relative to that to get to the mahout exampless bin directory in order to later execute ../../bin/mahout.

        As a result, the script will fail any time it needs to download and untar the reuters data.

        One fix would be to make the script a little smarter about where the mahout driver script is located by taking advantage of $

        {SCRIPT_PATH}

        Also, it would be nice if the work dir could be read from an environment variable. If the env variable is not set, it could be set to /tmp/mahout-work-$

        {user}

        .

        Show
        Drew Farris added a comment - This patch introduces another problem, specifically with the following line: cd ${WORK_DIR}/reuters-sgm && tar xzf ../reuters21578.tar.gz && cd .. && cd .. Here the script is making assumptions as to where the $ {WORK_DIR} is located, and cd-ing relative to that to get to the mahout exampless bin directory in order to later execute ../../bin/mahout. As a result, the script will fail any time it needs to download and untar the reuters data. One fix would be to make the script a little smarter about where the mahout driver script is located by taking advantage of $ {SCRIPT_PATH} Also, it would be nice if the work dir could be read from an environment variable. If the env variable is not set, it could be set to /tmp/mahout-work-$ {user} .
        Hide
        Sean Owen added a comment -

        Should be easy enough to do this without any cd-ing anywhere:

        tar xzf $

        {WORK_DIR}/reuters21578.tar.gz -C ${WORK_DIR}

        /reuters-sgm

        I'll patch it.

        Show
        Sean Owen added a comment - Should be easy enough to do this without any cd-ing anywhere: tar xzf $ {WORK_DIR}/reuters21578.tar.gz -C ${WORK_DIR} /reuters-sgm I'll patch it.
        Hide
        Drew Farris added a comment -

        Should be easy enough to do this without any cd-ing anywhere:

        tar xzf $

        Unknown macro: {WORK_DIR}

        /reuters21578.tar.gz -C $

        /reuters-sgm

        I'll patch it.

        Great Sean, thanks.

        This script also does rm -rf $

        {WORK_DIR}

        when it's done and I don't think it should. It is helpful to leave the work directories around so that you don't have to re-download the reuters tarfile and re-vectorize it should you want to run lda after trying kmeans. It's also helpful also to leave these files around for those curious to inspect the output files.

        This behavior wasn't introduced in the patch, but I just noticed it while testing.

        Show
        Drew Farris added a comment - Should be easy enough to do this without any cd-ing anywhere: tar xzf $ Unknown macro: {WORK_DIR} /reuters21578.tar.gz -C $ /reuters-sgm I'll patch it. Great Sean, thanks. This script also does rm -rf $ {WORK_DIR} when it's done and I don't think it should. It is helpful to leave the work directories around so that you don't have to re-download the reuters tarfile and re-vectorize it should you want to run lda after trying kmeans. It's also helpful also to leave these files around for those curious to inspect the output files. This behavior wasn't introduced in the patch, but I just noticed it while testing.
        Hide
        Andrew Bayer added a comment -

        Yeah, I kept the rm -rf for consistency, but changed my mind after submitting the patch. =)

        Show
        Andrew Bayer added a comment - Yeah, I kept the rm -rf for consistency, but changed my mind after submitting the patch. =)
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1042 (See https://builds.apache.org/job/Mahout-Quality/1042/)
        MAHOUT-811 fix working directory issue with WORK_DIR and extracting archive

        srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1171636
        Files :

        • /mahout/trunk/examples/bin/build-reuters.sh
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1042 (See https://builds.apache.org/job/Mahout-Quality/1042/ ) MAHOUT-811 fix working directory issue with WORK_DIR and extracting archive srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1171636 Files : /mahout/trunk/examples/bin/build-reuters.sh
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #1043 (See https://builds.apache.org/job/Mahout-Quality/1043/)
        MAHOUT-811 Don't delete work dir

        srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1171706
        Files :

        • /mahout/trunk/examples/bin/build-reuters.sh
        Show
        Hudson added a comment - Integrated in Mahout-Quality #1043 (See https://builds.apache.org/job/Mahout-Quality/1043/ ) MAHOUT-811 Don't delete work dir srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1171706 Files : /mahout/trunk/examples/bin/build-reuters.sh

          People

          • Assignee:
            Drew Farris
            Reporter:
            Andrew Bayer
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development