Details
Description
See https://forums.aws.amazon.com/thread.jspa?messageID=323063
According to this thread, only Amazon's proprietary hadoop-core.jar enables S3 to work on with Pig. Apache Pig does not work.
Example:
Apache Pig branch-0.9 as of today:
hadoop@ip-10-195-159-114:~$ pig/bin/pig
grunt> cd s3://elasticmapreduce/samples/pig-apache/input/
2012-02-29 05:45:22,282 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. This file system object (hdfs://10.195.159.114:9000) does not support access to the request path 's3://elasticmapreduce/samples/pig-apache/input' You possibly called FileSystem.get(conf) when you should have called FileSystem.get(uri, conf) to obtain a file system supporting your path.
Details at logfile: /home/hadoop/pig_1330494091268.log
grunt> quit
EMR's Pig as of today:
hadoop@ip-10-195-159-114:~$ pig
2012-02-29 05:45:35,626 [main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop/pig_1330494335621.log
2012-02-29 05:45:35,841 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://10.195.159.114:9000
2012-02-29 05:45:36,200 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: 10.195.159.114:9001
grunt> cd s3://elasticmapreduce/samples/pig-apache/input/