Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2707

Range globs do not work

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • 0.9.1
    • None
    • None
    • None
    • Amazon Elastic MapReduce. Hadoop 0.20.205

    Description

      Using e.g. 's3://foo/

      {14,15,16}' to load files works like a charm but neither 's3://foo/{14..16}' nor 's3://foo/{14...16}' works (I am not sure if it is two or three dots since both fail). Anyway, I'm getting errors like this when using ranges (no matter if it is two or three dots):

      Failed Jobs:
      JobId Alias Feature Message Outputs
      N/A A MAP_ONLY Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input Pattern s3://foo/{14...16} matches 0 files
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:282)
      at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:999)
      at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1016)
      at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:172)
      at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:934)
      at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:887)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
      at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:887)
      at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:861)
      at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
      at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
      at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
      at java.lang.Thread.run(Thread.java:662)
      Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input Pattern s3://foo/{14...16} matches 0 files
      at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
      at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:270)
      ... 14 more
      hdfs://10.53.9.207:9000/tmp/temp-783548169/tmp1508748976,

      Input(s):
      Failed to read data from "s3://foo/{14...16}"

      Output(s):
      Failed to produce result in "hdfs://10.53.9.207:9000/tmp/temp-783548169/tmp1508748976"

      Counters:
      Total records written : 0
      Total bytes written : 0
      Spillable Memory Manager spill count : 0
      Total bags proactively spilled: 0
      Total records proactively spilled: 0

      Job DAG:
      null



      I would expect {14...16} to work just like {14,15,16}

      :

      2012-05-17 18:29:59,098 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
      2012-05-17 18:29:59,164 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
      2012-05-17 18:29:59,165 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
      2012-05-17 18:29:59,165 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
      2012-05-17 18:29:59,182 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
      2012-05-17 18:29:59,182 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
      2012-05-17 18:31:14,493 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
      2012-05-17 18:31:14,567 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
      2012-05-17 18:31:14,582 [Thread-30] INFO org.apache.hadoop.mapred.JobClient - Default number of map tasks: null
      2012-05-17 18:31:14,583 [Thread-30] INFO org.apache.hadoop.mapred.JobClient - Setting default number of map tasks based on cluster size to : 8
      2012-05-17 18:31:14,583 [Thread-30] INFO org.apache.hadoop.mapred.JobClient - Default number of reduce tasks: 0
      2012-05-17 18:31:15,072 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
      2012-05-17 18:31:16,870 [Thread-30] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
      2012-05-17 18:31:16,870 [Thread-30] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
      2012-05-17 18:31:18,099 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201205171523_0033
      2012-05-17 18:31:18,099 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://10.53.9.207:9100/jobdetails.jsp?jobid=job_201205171523_0033
      2012-05-17 18:31:58,609 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
      2012-05-17 18:32:08,186 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
      2012-05-17 18:32:08,187 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

      HadoopVersion PigVersion UserId StartedAt FinishedAt Features
      0.20.205 0.9.1-amzn hadoop 2012-05-17 18:29:59 2012-05-17 18:32:08 UNKNOWN

      Success!

      Job Stats (time in seconds):
      JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs
      job_201205171523_0033 1 0 12 12 12 0 0 0 A MAP_ONLY hdfs://10.53.9.207:9000/tmp/temp-783548169/tmp1447928118,

      Input(s):
      Successfully read 3 records (410 bytes) from: "s3://foo/

      {14,15,16}

      "

      Output(s):
      Successfully stored 3 records (1405 bytes) in: "hdfs://10.53.9.207:9000/tmp/temp-783548169/tmp1447928118"

      Counters:
      Total records written : 3
      Total bytes written : 1405
      Spillable Memory Manager spill count : 0
      Total bags proactively spilled: 0
      Total records proactively spilled: 0

      Job DAG:
      job_201205171523_0033

      I am not sure if this is a Pig/Hadoop-issue or an Amazon EMR/S3-issue.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ca8 Christian Birch
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: