Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-4360

IOException: Multiple partitions for one merge mapper

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      hive version 2.3.7 ,tez version 0.9.2 . 

      The following error occurs when SQL is executed in Hive :

      [CF-100001]execute sql failed:org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=File Merge, vertexId=vertex_1631161845409_0980_2_00, diagnostics=[Task failed, taskId=task_1631161845409_0980_2_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1631161845409_0980_2_00_000000_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Multiple partitions for one merge mapper: hdfs://ns1/hive/warehouse/ns_dadev.db/dws_air_qlt_stat_20211029152731035/.hive-staging_hive_2021-11-25_16-25-07_185_3959810947578240944-5/-ext-10002/time_type=hour/time_col=2021102600/space_type=station/etl_script_id=air_sta_aqi_pp_1h/1 NOT EQUAL TO hdfs://ns1/hive/warehouse/ns_dadev.db/dws_air_qlt_stat_20211029152731035/.hive-staging_hive_2021-11-25_16-25-07_185_3959810947578240944-5/-ext-10002/time_type=hour/time_col=2021102600/space_type=station/etl_script_id=air_sta_aqi_pp_1h/2
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
      at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
      at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
      at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
      at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:422)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
      at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
      at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
      at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Multiple partitions for one merge mapper: hdfs://ns1/hive/warehouse/ns_dadev.db/dws_air_qlt_stat_20211029152731035/.hive-staging_hive_2021-11-25_16-25-07_185_3959810947578240944-5/-ext-10002/time_type=hour/time_col=2021102600/space_type=station/etl_script_id=air_sta_aqi_pp_1h/1 NOT EQUAL TO hdfs://ns1/hive/warehouse/ns_dadev.db/dws_air_qlt_stat_20211029152731035/.hive-staging_hive_2021-11-25_16-25-07_185_3959810947578240944-5/-ext-10002/time_type=hour/time_col=2021102600/space_type=station/etl_script_id=air_sta_aqi_pp_1h/2
      at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:221)
      at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:154)
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
      ... 14 more
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Multiple partitions for one merge mapper: hdfs://ns1/hive/warehouse/ns_dadev.db/dws_air_qlt_stat_20211029152731035/.hive-staging_hive_2021-11-25_16-25-07_185_3959810947578240944-5/-ext-10002/time_type=hour/time_col=2021102600/space_type=station/etl_script_id=air_sta_aqi_pp_1h/1 NOT EQUAL TO hdfs://ns1/hive/warehouse/ns_dadev.db/dws_air_qlt_stat_20211029152731035/.hive-staging_hive_2021-11-25_16-25-07_185_3959810947578240944-5/-ext-10002/time_type=hour/time_col=2021102600/space_type=station/etl_script_id=air_sta_aqi_pp_1h/2
      at org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.processKeyValuePairs(OrcFileMergeOperator.java:169)
      at org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.process(OrcFileMergeOperator.java:72)
      at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:212)
      ... 16 more

       

       

       

      Execute SQL :

      insert overwrite table dws_air_qlt_stat partition (time_type,time_col,space_type,etl_script_id) 
      select 
          a.space_num as space_num,             
          a.space_name as space_name,            
          '1h_aqi' as pltt_item,                 
          '1AQI' as pltt_item_desc,          
          cast(floor(a.stat_rslt)as string) as stat_rslt,           
          a.stat_rslt_valid_ind as stat_rslt_valid_ind,  
          current_timestamp as insert_time,       
          'hour' as time_type,                   
          a.time_col as time_col,               
          'station' as space_type,               
          'air_sta_aqi_pp_1h' as etl_script_id   
      from 
          (select
              space_num,
              space_name,
              stat_rslt_valid_ind,
              time_col,
              max(stat_rslt) as stat_rslt        
          from dws_air_qlt_stat
          where space_type = 'station' and time_type = 'hour' 
              and pltt_item in ('1h_avg_iaqi_co','1h_avg_iaqi_no2','1h_avg_iaqi_o3','8h_mavg_iaqi_o3','24h_mavg_iaqi_pm10','24h_mavg_iaqi_pm2_5','1h_avg_iaqi_so2')
              and time_col='${phour}'
              and stat_rslt_valid_ind='1'
          group by space_num,space_name,stat_rslt_valid_ind,time_col
          ) a 
      union all
      select 
          b.space_num as space_num,             
          b.space_name as space_name,            
          '1h_pp' as pltt_item,                  
          '1xxxx' as pltt_item_desc,    
          concat_ws(',',collect_set(cast (b.pltt_item as string))) as stat_rslt, 
          b.stat_rslt_valid_ind as stat_rslt_valid_ind,   
          current_timestamp as insert_time,       
          'hour' as time_type,                    
          b.time_col as time_col,                
          'station' as space_type,              
          'air_sta_aqi_pp_1h' as etl_script_id   
      from 
          (select
              space_num,
              max(stat_rslt) as stat_rslt         
          from dws_air_qlt_stat
          where space_type = 'station' and time_type = 'hour' 
              and pltt_item in ('1h_avg_iaqi_co','1h_avg_iaqi_no2','1h_avg_iaqi_o3','8h_mavg_iaqi_o3','24h_mavg_iaqi_pm10','24h_mavg_iaqi_pm2_5','1h_avg_iaqi_so2')
              and time_col='${phour}'
              and stat_rslt_valid_ind='1'
          group by space_type,time_type,time_col,space_num
          ) a 
      join dws_air_qlt_stat b
          on a.space_num = b.space_num  
          and a.stat_rslt = b.stat_rslt and b.space_type = 'station' 
          and b.time_type = 'hour' and b.pltt_item in ('1h_avg_iaqi_co','1h_avg_iaqi_no2','1h_avg_iaqi_o3','8h_mavg_iaqi_o3','24h_mavg_iaqi_pm10','24h_mavg_iaqi_pm2_5','1h_avg_iaqi_so2')
          and b.time_col='${phour}'
          and b.stat_rslt_valid_ind='1'
      group by b.space_num,b.space_name,b.stat_rslt_valid_ind,b.time_col

       

       

      The above sql is not a problem in Hive on mr , error in Hive on tez

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            zhouzy sundy_baba_zhouzy
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: