Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8207

Add .q tests for multi-table insertion [Spark Branch]

    XMLWordPrintableJSON

Details

    • Test
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.1.0
    • Spark
    • None

    Description

      Now that multi-table insertion is committed to branch, we should enable those related qtests.

      Here is a list of qfiles that should be activated (some of them may already be activated).
      The list may not be comprehensive.

      add_part_multiple.q
      auto_smb_mapjoin_14.q
      bucket5.q
      column_access_stats.q
      date_udf.q
      groupby10.q
      groupby11.q
      groupby3_map_multi_distinct.q
      groupby3_map.q
      groupby3_map_skew.q
      groupby3_noskew_multi_distinct.q
      groupby3_noskew.q
      groupby7_map_multi_single_reducer.q
      groupby7_map.q
      groupby7_map_skew.q
      groupby7_noskew_multi_single_reducer.q
      groupby7_noskew.q
      groupby7.q
      groupby8_map.q
      groupby8_map_skew.q
      groupby8_noskew.q
      groupby8.q
      groupby9.q
      groupby_complex_types_multi_single_reducer.q
      groupby_complex_types.q
      groupby_cube1.q
      groupby_map_ppr_multi_distinct.q
      groupby_map_ppr.q
      groupby_multi_insert_common_distinct.q
      groupby_multi_single_reducer2.q
      groupby_multi_single_reducer3.q
      groupby_multi_single_reducer.q
      groupby_position.q
      groupby_ppr.q
      groupby_rollup1.q
      groupby_sort_1_23.q
      groupby_sort_1.q
      groupby_sort_skew_1_23.q
      infer_bucket_sort_multi_insert.q
      innerjoin.q
      input12_hadoop20.q
      input12.q
      input13.q
      input14.q
      input17.q
      input18.q
      input1_limit.q
      input_part2.q
      insert_into3.q
      join_nullsafe.q
      load_dyn_part8.q
      metadata_only_queries_with_filters.q
      multigroupby_singlemr.q
      multi_insert_gby2.q
      multi_insert_gby3.q
      multi_insert_gby.q
      multi_insert_lateral_view.qmulti_insert_move_tasks_share_dependencies.q
      multi_insert.q
      parallel.q
      partition_date2.q
      pcr.q
      ppd_multi_insert.q
      ppd_transform.q
      smb_mapjoin_11.q
      smb_mapjoin_12.q
      smb_mapjoin_13.q
      smb_mapjoin_15.q
      smb_mapjoin_16.q
      stats4.q
      subquery_multiinsert.q
      table_access_keys_stats.q
      tez_dml.q
      udaf_percentile_approx_20.q
      udaf_percentile_approx_23.q
      union17.q
      union18.q
      union19.q
      

      There are some tests that cannot be enabled right now, due to various reasons:

      1. ForwardOperator Issue, including

      groupby7_noskew_multi_single_reducer.q
      groupby8_map.q
      groupby8_map_skew.q
      groupby8_noskew.q
      groupby8.q
      groupby9.q
      groupby10.q
      groupby_multi_insert_common_distinct.q 
      union17.q
      

      Reason: currently, if the node to break in the operator tree is a ForwardOperator, we simple do nothing. However, we may have the following case:

            ...
            RS_0
             |
            FOR
             |
           /   \
         GBY_1  GBY_2
          |     |
         ...   ...
          |     |
         RS_1  RS_2
          |     |
         ...   ...
          |     |
         FS_1  FS_2
      

      which may result to:

                RW
               /  \
             RW    RW
      

      and because of the issue in HIVE-7731 and HIVE-8118, both downstream branches will get duplicated (and same) inputs.

      2. Stats issue, including:

      bucket5.q
      infer_bucket_sort_multi_insert.q
      stats4.q
      smb_mapjoin_13.q
      smb_mapjoin_15.q
      

      Reason: In these tests, I get diff error because numRows and rawDataSize are -1, but they are expected to be some positive value. I don't think this is related to multi-insertion.

      3. Join/SMB Join Issue, including

      auto_smb_mapjoin_14.q
      auto_sortmerge_join_13.q
      smb_mapjoin_11.q
      smb_mapjoin_12.q
      smb_mapjoin_13.q
      smb_mapjoin_15.q
      smb_mapjoin_16.q
      

      Reason: These tests either failed with exception or failed with diff. I think it's because SMB Join (HIVE-8202) isn't supported right now.

      4. Result doesn't match, including

      groupby3_map_skew.q
      groupby_map_ppr_multi_distinct.q
      groupby_complex_types_multi_single_reducer.q
      groupby_map_ppr.q
      partition_date2.q
      udaf_percentile_approx_23.q
      

      Reason: The results from these tests are different from MR's. For instance, test for groupby3_map_skew.q failed because:

      < 130091.0      260.182 256.10355987055016      98.0    0.0     142.92680950752379      143.06995106518903      20428.07288     20469.0109
      ---
      > 130091.0      260.182 256.10355987055016      98.0    0.0     142.9268095075238       143.06995106518906      20428.07288     20469.0109
      

      I don't know why this will happen. But, I think they may not be related to multi-insertion.

      Attachments

        1. HIVE-8207.1-spark.patch
          1.57 MB
          Chao Sun
        2. HIVE-8207.2-spark.patch
          1.58 MB
          Chao Sun
        3. HIVE-8207.3-spark.patch
          1.61 MB
          Chao Sun

        Issue Links

          Activity

            People

              csun Chao Sun
              csun Chao Sun
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: