Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1698

WHEN USING PAREDUCE WITH PIG THE FILE TO LOAD IS NOT FOUND "ERROR: <FILE> DOES NOT EXIST"

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.7.0
    • None
    • None
    • None
    • RUNNING IN UBUNTU 10.4, JAVA INSTALLED, HAD0OP VERSION IS 0.20.2

    • PIG ERROR MAPREDUCE

    Description

      Hello,
      I am having an error that is driving me crazy. Any help will be appreciated.

      First, I have configured hadoop and hdfs according to this tutorial (I
      did not created an account hadoop, used mine instead)
      http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
      and I do not have any problem, I could run the wordcount. In other
      words, I could run the following command without problems: $ bin/hadoop
      jar hadoop-0.20.2-examples.jar wordcount gutenberg gutenberg-output

      I have also followed the Pig tutorial here
      http://pig.apache.org/docs/r0.7.0/setup.html#Sample+Code and
      http://pig.apache.org/docs/r0.7.0/tutorial.html .

      For both cases, I can run local pig scripts without problems.
      Nevertheless, with using HADOOP and PIG to run Mapreduce jobs, I have
      the same error...it can not detect the file that is being loaded... I
      have put that file into the hdfs directory (the same used in the
      wordcount directory), I have plases the file to be load everywhere and I
      still have the error that the file to be loaded "does not exist." For
      some reason, when I am using PIG it seems to me that it tries to detect
      files from a unknown directory (for me). Could someone please help me
      with this issue??
      The error that I receive for the example
      http://pig.apache.org/docs/r0.7.0/setup.html#Sample+Code when using :
      $ java -cp pig.jar:.:$HADOOPDIR idmapreduce is:
      /
      $ java -cp pig.jar:.:$HADOOPDIR idmapreduce
      10/10/25 17:10:01 INFO executionengine.HExecutionEngine: Connecting to
      hadoop file system at: file:///
      10/10/25 17:10:01 INFO jvm.JvmMetrics: Initializing JVM Metrics with
      processName=JobTracker, sessionId=
      10/10/25 17:10:03 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics
      with processName=JobTracker, sessionId= - already initialized
      10/10/25 17:10:03 WARN mapred.JobClient: Use GenericOptionsParser for
      parsing the arguments. Applications should implement Tool for the same.
      *10/10/25 17:10:08 INFO mapReduceLayer.MapReduceLauncher: 0% complete
      10/10/25 17:10:08 ERROR mapReduceLayer.MapReduceLauncher: Map reduce job
      failed*
      10/10/25 17:10:08 *ERROR mapReduceLayer.MapReduceLauncher:
      java.io.IOException: passwd does not exist*
      at
      org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:115)
      at
      org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
      at
      org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
      at
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
      at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
      at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
      at
      org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
      at
      org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
      at java.lang.Thread.run(Thread.java:619)

      /When doing the same with
      http://pig.apache.org/docs/r0.7.0/tutorial.html using :
      $ java -cp $HOME/pigtmp/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main
      $HOME/pigtmp/script1-hadooptest.pig

      2010-10-25 17:14:32,651 [main] INFO
      org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
      Connecting to hadoop file system at: file:///
      2010-10-25 17:14:32,815 [main] INFO
      org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
      processName=JobTracker, sessionId=
      2010-10-25 17:14:34,312 [main] INFO
      org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
      with processName=JobTracker, sessionId= - already initialized
      2010-10-25 17:14:34,314 [Thread-4] WARN
      org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for
      parsing the arguments. Applications should implement Tool for the same.
      *2010-10-25 17:14:39,312 [main] INFO
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher

      • 0% complete
        2010-10-25 17:14:39,313 [main] ERROR
        org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
      • Map reduce job failed
        2010-10-25 17:14:39,313 [main] ERROR
        org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
      • java.io.IOException: excite.log.bz2 does not exist*
        at
        org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:115)
        at
        org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
        at
        org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
        at
        org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
        at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
        at
        org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
        at
        org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
        at java.lang.Thread.run(Thread.java:619)

      Please, can someone help me??

      Ruth

      Attachments

        Activity

          People

            Unassigned Unassigned
            ruthygarcia RUTH
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 96h
                96h
                Remaining:
                Remaining Estimate - 96h
                96h
                Logged:
                Time Spent - Not Specified
                Not Specified