Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-569

Inconsistency with Hadoop in Pig load statements involving globs with subdirectories

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 0.2.0
    • 0.2.0
    • impl
    • None
    • FC Linux x86/64, Pig revision 724576

    Description

      Pig cannot handle LOAD statements with Hadoop globs where the globs have subdirectories. For example,

      A = LOAD 'dir/

      {dir1/subdir1,dir2/subdir2,dir3/subdir3}' USING ...

      A similar statement in Hadoop, hadoop dfs -ls dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}

      , does work correctly.

      The output of running the above load statement in pig, built from svn revision 724576, is:

      2008-12-17 12:02:28,480 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
      2008-12-17 12:02:28,480 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Map reduce job failed
      2008-12-17 12:02:28,480 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - java.io.IOException: Unable to get collect for pattern dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}} [Failed to obtain glob for dir/

      {dir1/subdir1,dir2/subdir2,dir3/subdir3}]
      at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:231)
      at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:40)
      at org.apache.pig.impl.io.FileLocalizer.globMatchesFiles(FileLocalizer.java:486)
      at org.apache.pig.impl.io.FileLocalizer.fileExists(FileLocalizer.java:455)
      at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:108)
      at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
      at org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
      at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
      at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
      at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
      at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
      at java.lang.Thread.run(Thread.java:619)
      Caused by: org.apache.pig.backend.datastorage.DataStorageException: Failed to obtain glob for dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}

      ... 13 more
      Caused by: java.io.IOException: Illegal file pattern: Expecting set closure character or end of range, or } for glob {dir1 at 5
      at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1084)
      at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1069)
      at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:987)
      at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:953)
      at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
      at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
      at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
      at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:902)
      at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:862)
      at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:215)
      ... 12 more

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kevinweil Kevin Weil
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: