Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-535

mahout seqdirectory reads only from the local filesystem, even when running over Hadoop

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.5
    • 0.5
    • classic
    • None
    • local and hadoop

    Description

      It seems as if seqdirectory only reads from the local filesystem, though it writes correctly to the HDFS.

      Consider 'myurls-local' and 'myurls-dfs', the former existing in the working directory and the latter existing on the home directory of the HDFS.

      Running:
      MAHOUT_HOME=. ./bin/mahout seqdirectory -i myurls-local -o myurls-seqdir -c UTF-8 -chunk

      acts as expected (myurls-seqdir is created on the local filesystem)

      Running:
      MAHOUT_HOME=. HADOOP_HOME=/usr/lib/hadoop-0.20 HADOOP_CONF_DIR=/etc/hadoop-0.20/conf ./bin/mahout seqdirectory -i myurls-dfs -o myurls-seqdir -c UTF-8 -chunk

      creates a 12kb myurls-seqdir directory on the DFS. Presumably, it couldn't read myurls-dfs from the DFS and ended up creating a nearly-empty sequence directory.

      Running:
      MAHOUT_HOME=. HADOOP_HOME=/usr/lib/hadoop-0.20 HADOOP_CONF_DIR=/etc/hadoop-0.20/conf ./bin/mahout seqdirectory -i myurls-local -o myurls-seqdir -c UTF-8 -chunk

      acts as expected, creating a substantial myurls-seqdir on the DFS.

      Attachments

        Activity

          People

            isabel Isabel Drost-Fromm
            mspitz Matt Spitz
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: