Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
hive version 2.3.7 ,tez version 0.9.2 .
The following error occurs when SQL is executed in Hive :
[CF-100001]execute sql failed:org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=File Merge, vertexId=vertex_1631161845409_0980_2_00, diagnostics=[Task failed, taskId=task_1631161845409_0980_2_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1631161845409_0980_2_00_000000_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Multiple partitions for one merge mapper: hdfs://ns1/hive/warehouse/ns_dadev.db/dws_air_qlt_stat_20211029152731035/.hive-staging_hive_2021-11-25_16-25-07_185_3959810947578240944-5/-ext-10002/time_type=hour/time_col=2021102600/space_type=station/etl_script_id=air_sta_aqi_pp_1h/1 NOT EQUAL TO hdfs://ns1/hive/warehouse/ns_dadev.db/dws_air_qlt_stat_20211029152731035/.hive-staging_hive_2021-11-25_16-25-07_185_3959810947578240944-5/-ext-10002/time_type=hour/time_col=2021102600/space_type=station/etl_script_id=air_sta_aqi_pp_1h/2
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Multiple partitions for one merge mapper: hdfs://ns1/hive/warehouse/ns_dadev.db/dws_air_qlt_stat_20211029152731035/.hive-staging_hive_2021-11-25_16-25-07_185_3959810947578240944-5/-ext-10002/time_type=hour/time_col=2021102600/space_type=station/etl_script_id=air_sta_aqi_pp_1h/1 NOT EQUAL TO hdfs://ns1/hive/warehouse/ns_dadev.db/dws_air_qlt_stat_20211029152731035/.hive-staging_hive_2021-11-25_16-25-07_185_3959810947578240944-5/-ext-10002/time_type=hour/time_col=2021102600/space_type=station/etl_script_id=air_sta_aqi_pp_1h/2
at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:221)
at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:154)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Multiple partitions for one merge mapper: hdfs://ns1/hive/warehouse/ns_dadev.db/dws_air_qlt_stat_20211029152731035/.hive-staging_hive_2021-11-25_16-25-07_185_3959810947578240944-5/-ext-10002/time_type=hour/time_col=2021102600/space_type=station/etl_script_id=air_sta_aqi_pp_1h/1 NOT EQUAL TO hdfs://ns1/hive/warehouse/ns_dadev.db/dws_air_qlt_stat_20211029152731035/.hive-staging_hive_2021-11-25_16-25-07_185_3959810947578240944-5/-ext-10002/time_type=hour/time_col=2021102600/space_type=station/etl_script_id=air_sta_aqi_pp_1h/2
at org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.processKeyValuePairs(OrcFileMergeOperator.java:169)
at org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.process(OrcFileMergeOperator.java:72)
at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:212)
... 16 more
Execute SQL :
insert overwrite table dws_air_qlt_stat partition (time_type,time_col,space_type,etl_script_id)
select
a.space_num as space_num,
a.space_name as space_name,
'1h_aqi' as pltt_item,
'1AQI' as pltt_item_desc,
cast(floor(a.stat_rslt)as string) as stat_rslt,
a.stat_rslt_valid_ind as stat_rslt_valid_ind,
current_timestamp as insert_time,
'hour' as time_type,
a.time_col as time_col,
'station' as space_type,
'air_sta_aqi_pp_1h' as etl_script_id
from
(select
space_num,
space_name,
stat_rslt_valid_ind,
time_col,
max(stat_rslt) as stat_rslt
from dws_air_qlt_stat
where space_type = 'station' and time_type = 'hour'
and pltt_item in ('1h_avg_iaqi_co','1h_avg_iaqi_no2','1h_avg_iaqi_o3','8h_mavg_iaqi_o3','24h_mavg_iaqi_pm10','24h_mavg_iaqi_pm2_5','1h_avg_iaqi_so2')
and time_col='${phour}'
and stat_rslt_valid_ind='1'
group by space_num,space_name,stat_rslt_valid_ind,time_col
) a
union all
select
b.space_num as space_num,
b.space_name as space_name,
'1h_pp' as pltt_item,
'1xxxx' as pltt_item_desc,
concat_ws(',',collect_set(cast (b.pltt_item as string))) as stat_rslt,
b.stat_rslt_valid_ind as stat_rslt_valid_ind,
current_timestamp as insert_time,
'hour' as time_type,
b.time_col as time_col,
'station' as space_type,
'air_sta_aqi_pp_1h' as etl_script_id
from
(select
space_num,
max(stat_rslt) as stat_rslt
from dws_air_qlt_stat
where space_type = 'station' and time_type = 'hour'
and pltt_item in ('1h_avg_iaqi_co','1h_avg_iaqi_no2','1h_avg_iaqi_o3','8h_mavg_iaqi_o3','24h_mavg_iaqi_pm10','24h_mavg_iaqi_pm2_5','1h_avg_iaqi_so2')
and time_col='${phour}'
and stat_rslt_valid_ind='1'
group by space_type,time_type,time_col,space_num
) a
join dws_air_qlt_stat b
on a.space_num = b.space_num
and a.stat_rslt = b.stat_rslt and b.space_type = 'station'
and b.time_type = 'hour' and b.pltt_item in ('1h_avg_iaqi_co','1h_avg_iaqi_no2','1h_avg_iaqi_o3','8h_mavg_iaqi_o3','24h_mavg_iaqi_pm10','24h_mavg_iaqi_pm2_5','1h_avg_iaqi_so2')
and b.time_col='${phour}'
and b.stat_rslt_valid_ind='1'
group by b.space_num,b.space_name,b.stat_rslt_valid_ind,b.time_col
The above sql is not a problem in Hive on mr , error in Hive on tez