[HADOOP-9328] INSERT INTO a S3 external table with no reduce phase results in FileNotFoundException - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Cannot Reproduce
Affects Version/s: 2.0.2-alpha
Fix Version/s: None
Component/s: fs/s3
Labels:
None
Environment:

YARN, Hadoop 2.0.2-alpha
Ubuntu

Description

With Yarn and Hadoop 2.0.2-alpha, hive 0.9.0.

The destination is an S3 table, the source for the query is a small hive managed table.

CREATE EXTERNAL TABLE payout_state_product (
state STRING,
product_id STRING,
element_id INT,
element_value DOUBLE,
number_of_fields INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://com.weatherbill.foo/bar/payout_state_product/';

A simple query to copy the results from the hive managed table into a S3.

hive> INSERT OVERWRITE TABLE payout_state_product
SELECT * FROM payout_state_product_cached;

Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1360884012490_0014, Tracking URL = http://i-9ff9e9ef.us-east-1.production.climatedna.net:8088/proxy/application_1360884012490_0014/
Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=i-9ff9e9ef.us-east-1.production.climatedna.net:8032 -kill job_1360884012490_0014
Hadoop job information for Stage-1: number of mappers: 100; number of reducers: 0
2013-02-22 19:15:46,709 Stage-1 map = 0%, reduce = 0%
...snip...
2013-02-22 19:17:02,374 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 427.13 sec
MapReduce Total cumulative CPU time: 7 minutes 7 seconds 130 msec
Ended Job = job_1360884012490_0014
Ended Job = -1776780875, job is filtered out (removed at runtime).
Launching Job 2 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
java.io.FileNotFoundException: File does not exist: /tmp/hive-marc/hive_2013-02-22_19-15-31_691_7365912335285010827/-ext-10002/000000_0
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:782)
at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:493)
at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:284)
at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:244)
at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:69)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:386)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:352)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.processPaths(CombineHiveInputFormat.java:419)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:390)
at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:479)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:617)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:612)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:612)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435)
at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1326)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1118)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: /tmp/hive-marc/hive_2013-02-22_19-15-31_691_7365912335285010827/-ext-10002/000000_0)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask

INSERT INTO a S3 external table with no reduce phase results in FileNotFoundException

Details

Description

Attachments

Activity

People

Dates