Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3276

hadoop dfs -copyToLocal/copyFromLocal called within Hadoop Streaming returns early

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Later
    • Affects Version/s: 0.20.2
    • Fix Version/s: None
    • Component/s: contrib/streaming
    • Environment:

      Linux RedHat Enterprise Linux 5.
      31 node cluster with 1 as JobTracker and NameNode, and 30 as TaskTracker and DataNode.

      Description

      I'm using the Cloudera hadoop realease 0.20.2.+737 to parallelize bash scripts with Hadoop Streaming.

      Below is an example script that i've been running which simply copies a file from hdfs to a local node.

      SampleMapper.sh
       hadoop dfs -copyToLocal /path/to/some/large/file/myFile myFile
       # Spin until the file is fully copied.
       while [ ! -f myFile ]
       do 
        echo "spin"
        sleep 1 
       done
      

      Surprisingly, the copy call returns before the file is copied, if the file is sufficiently large, and the while loop spins for several iterations. I'm seeing similar behavior with copyFromLocal.

      I've asked about this issue on other forms and no one else seems to have had the problem, although I don't know how many peoplpe are attempting to do this particular task.

      Has this been fixed in more recent versions of hadoop?

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              fozziethebeat Keith Stevens
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: