Hive
  1. Hive
  2. HIVE-3652

Join optimization for star schema

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Query Processor
    • Labels:
      None

      Description

      Currently, if we join one fact table with multiple dimension tables, it results in multiple mapreduce jobs for each join with dimension table, because join would be on different keys for each dimension.
      Usually all the dimension tables will be small and can fit into memory and so map-side join can used to join with fact table.

      In this issue I want to look at optimizing such query to generate single mapreduce job sothat mapper loads dimension tables into memory and joins with fact table on different keys as well.

      1. HIVE-3652-tests.patch
        23 kB
        Amareshwari Sriramadasu
      2. HIVE-3652-tests.patch
        17 kB
        Amareshwari Sriramadasu

        Issue Links

          Activity

          Hide
          Amareshwari Sriramadasu added a comment -

          I'm thinking about some options for implementing this

          1. Use ChainMapper and ChainReducer for queries having series of Mapper. Raised HIVE-3655 for exploring this option independently.
          2. Define a new StarJoinOperator which can work on such a join, because it very common join that people would do.
          3. Extend MapJoinOperator to work on multiple join keys

          Let me know your thoughts and suggestions

          Show
          Amareshwari Sriramadasu added a comment - I'm thinking about some options for implementing this 1. Use ChainMapper and ChainReducer for queries having series of Mapper. Raised HIVE-3655 for exploring this option independently. 2. Define a new StarJoinOperator which can work on such a join, because it very common join that people would do. 3. Extend MapJoinOperator to work on multiple join keys Let me know your thoughts and suggestions
          Hide
          Mark Grover added a comment -

          Amareshwari, what are your thoughts on how the user can specify which is a fact table and which is a dimension table? Or, are you using storage based statistics to infer that information?

          Show
          Mark Grover added a comment - Amareshwari, what are your thoughts on how the user can specify which is a fact table and which is a dimension table? Or, are you using storage based statistics to infer that information?
          Hide
          Namit Jain added a comment -

          Amareshwari Sriramadasu, do you think it would be possible to get a cheap implementation with a single mapper performing
          multiple dimension joins one after the other ? I think, we should start with a few assumptions:

          1. The user will not give any map-join hints. Let us derive thin info. from the statistics, as Mark suggested.
          2. This should be a single map-only job (no reducer).
          3. As a first cut, it might be useful to optimize the query:

          select /*+ MAPJOIN(b,c) */ from FACT a join DIM1 b on a.k1=b.k1 JOIN DIM2 c on a.k2=c.k2

          The above query should not have a back-up task in case the map-join fails, and it should be run
          as a single map-only job. Once that is done, using cost to convert join into map-join can be explored independently.

          Show
          Namit Jain added a comment - Amareshwari Sriramadasu , do you think it would be possible to get a cheap implementation with a single mapper performing multiple dimension joins one after the other ? I think, we should start with a few assumptions: 1. The user will not give any map-join hints. Let us derive thin info. from the statistics, as Mark suggested. 2. This should be a single map-only job (no reducer). 3. As a first cut, it might be useful to optimize the query: select /*+ MAPJOIN(b,c) */ from FACT a join DIM1 b on a.k1=b.k1 JOIN DIM2 c on a.k2=c.k2 The above query should not have a back-up task in case the map-join fails, and it should be run as a single map-only job. Once that is done, using cost to convert join into map-join can be explored independently.
          Hide
          Mark Grover added a comment -

          Namit, can you please elaborate on why there shouldn't be a back-up task? In case where a dimension table (for some reason) is bigger than what can fit in in mapper's memory, do we want to fail or fallback to a regular join?

          Show
          Mark Grover added a comment - Namit, can you please elaborate on why there shouldn't be a back-up task? In case where a dimension table (for some reason) is bigger than what can fit in in mapper's memory, do we want to fail or fallback to a regular join?
          Hide
          Namit Jain added a comment -

          I was thinking more from the point of the current implementation.
          A backup task is per join operation currently.
          Thinking more about it, we can have a backup task (which can be a tree of tasks).

          It would be very difficult to fit the following in the current architecture.
          There are 10 dimension tables, 9 of them fit into memory and one of them dont.
          Perform a map-only join for the first 9, and then a regular backup join for the last one.
          I am not sure, if we want to optimize that.

          Show
          Namit Jain added a comment - I was thinking more from the point of the current implementation. A backup task is per join operation currently. Thinking more about it, we can have a backup task (which can be a tree of tasks). It would be very difficult to fit the following in the current architecture. There are 10 dimension tables, 9 of them fit into memory and one of them dont. Perform a map-only join for the first 9, and then a regular backup join for the last one. I am not sure, if we want to optimize that.
          Hide
          Amareshwari Sriramadasu added a comment -

          Amareshwari, what are your thoughts on how the user can specify which is a fact table and which is a dimension table? Or, are you using storage based statistics to infer that information?

          If the query involves join of one big table with multiple small tables and join is on different keys, we can always generate single map-only job. I'm thinking we need not have any other ways to specify which is fact and which is dimension table.

          do you think it would be possible to get a cheap implementation with a single mapper performing

          multiple dimension joins one after the other?

          Yes. I will do for this first.

          Show
          Amareshwari Sriramadasu added a comment - Amareshwari, what are your thoughts on how the user can specify which is a fact table and which is a dimension table? Or, are you using storage based statistics to infer that information? If the query involves join of one big table with multiple small tables and join is on different keys, we can always generate single map-only job. I'm thinking we need not have any other ways to specify which is fact and which is dimension table. do you think it would be possible to get a cheap implementation with a single mapper performing multiple dimension joins one after the other? Yes. I will do for this first.
          Hide
          Mark Grover added a comment -

          Namit Jain Agreed. I would be fine with no backup join, at least in the first pass.

          Amareshwari Sriramadasu Thanks for the response.

          If we are going figure out fact and dimension tables by just looking at the join keys and relative sizes of tables, there could be existing Hive queries out there that might start failing (albeit by flipping of a property or inclusion of a query hint) when they upgrade Hive. That's where I think having a backup task would be nice because the queries will still continue to pass if we have a backup task.

          Show
          Mark Grover added a comment - Namit Jain Agreed. I would be fine with no backup join, at least in the first pass. Amareshwari Sriramadasu Thanks for the response. If we are going figure out fact and dimension tables by just looking at the join keys and relative sizes of tables, there could be existing Hive queries out there that might start failing (albeit by flipping of a property or inclusion of a query hint) when they upgrade Hive. That's where I think having a backup task would be nice because the queries will still continue to pass if we have a backup task.
          Hide
          Amareshwari Sriramadasu added a comment -

          "If we have series of MapJoinOperators, and the Operator tree has MapJoin followed by MapJoin", then we can run all the map joins in single query. Some of this already solved by HIVE-1246. I think we need to do some more changes to accept queries of the following form also :

          select /*+ MAPJOIN(b,c) */ from FACT a join DIM1 b on a.k1=b.k1 JOIN DIM2 c on a.k2=c.k2

          or

          select /*+ MAPJOIN(b,c) */ from FACT a join DIM1 b on a.k1=b.k1 JOIN DIM2 c on b.k2=c.k2

          Show
          Amareshwari Sriramadasu added a comment - "If we have series of MapJoinOperators, and the Operator tree has MapJoin followed by MapJoin", then we can run all the map joins in single query. Some of this already solved by HIVE-1246 . I think we need to do some more changes to accept queries of the following form also : select /*+ MAPJOIN(b,c) */ from FACT a join DIM1 b on a.k1=b.k1 JOIN DIM2 c on a.k2=c.k2 or select /*+ MAPJOIN(b,c) */ from FACT a join DIM1 b on a.k1=b.k1 JOIN DIM2 c on b.k2=c.k2
          Hide
          Amareshwari Sriramadasu added a comment -

          "If we have series of MapJoinOperators, and the Operator tree has MapJoin followed by MapJoin", then we can run all the map joins in single query. Some of this already solved by HIVE-1246.

          Sorry, was too quick here. HIVE-1246 solves case of all join keys being same.

          If we have MapJoin followed by MapJoin, can we make the second operator child of first instead of a sink in between? I'm thinking that should just working for the cases of joining on different keys. Let me know if I'm wrong.

          Show
          Amareshwari Sriramadasu added a comment - "If we have series of MapJoinOperators, and the Operator tree has MapJoin followed by MapJoin", then we can run all the map joins in single query. Some of this already solved by HIVE-1246 . Sorry, was too quick here. HIVE-1246 solves case of all join keys being same. If we have MapJoin followed by MapJoin, can we make the second operator child of first instead of a sink in between? I'm thinking that should just working for the cases of joining on different keys. Let me know if I'm wrong.
          Hide
          Amareshwari Sriramadasu added a comment -

          select /*+ MAPJOIN(b,c) */ from FACT a join DIM1 b on a.k1=b.k1 JOIN DIM2 c on a.k2=c.k2

          I modified the above query to be the following (with a subquery) :

          SELECT /+ MAPJOIN(dim2) */ subq.m1, subq.m2 FROM (SELECT /+ MAPJOIN(dim1) */ m1, m2, k2 FROM fact JOIN dim1 ON (fact.k1 = dim1.k1)) subq JOIN dim2 ON (subq.k2 = dim2.k2);

          And it is already launching a single map reduce job for both the joins.

          Show
          Amareshwari Sriramadasu added a comment - select /*+ MAPJOIN(b,c) */ from FACT a join DIM1 b on a.k1=b.k1 JOIN DIM2 c on a.k2=c.k2 I modified the above query to be the following (with a subquery) : SELECT / + MAPJOIN(dim2) */ subq.m1, subq.m2 FROM (SELECT / + MAPJOIN(dim1) */ m1, m2, k2 FROM fact JOIN dim1 ON (fact.k1 = dim1.k1)) subq JOIN dim2 ON (subq.k2 = dim2.k2); And it is already launching a single map reduce job for both the joins.
          Hide
          Namit Jain added a comment -

          I think this will launch 2 map-only jobs in the presence of auto join conversion.
          We do want to support auto join conversion.

          Show
          Namit Jain added a comment - I think this will launch 2 map-only jobs in the presence of auto join conversion. We do want to support auto join conversion.
          Hide
          Amareshwari Sriramadasu added a comment -

          Namit, Are you saying if we enhance MapJoinProcessor to do single map for MapJoin followed by MapJoin (as in HIVE-1246) may not work in auto join conversion?

          I thought whether it is by providing hint or auto join conversion, it does not matter to MapJoinProcessor optimizer. Correct me if I'm wrong.

          Show
          Amareshwari Sriramadasu added a comment - Namit, Are you saying if we enhance MapJoinProcessor to do single map for MapJoin followed by MapJoin (as in HIVE-1246 ) may not work in auto join conversion? I thought whether it is by providing hint or auto join conversion, it does not matter to MapJoinProcessor optimizer. Correct me if I'm wrong.
          Hide
          Namit Jain added a comment -

          Yes, it will not work automatically.

          You need to change the auto join conversion logic, where you will create a task tree as a backup task for multiple mapjoins.
          It is do-able, and maybe the cleanest way of fitting it into the current architecture.

          Show
          Namit Jain added a comment - Yes, it will not work automatically. You need to change the auto join conversion logic, where you will create a task tree as a backup task for multiple mapjoins. It is do-able, and maybe the cleanest way of fitting it into the current architecture.
          Hide
          Vikram Dixit K added a comment -

          Amareshwari Sriramadasu I am quite interested in this jira and was wondering what phase you are in with respect to design/implementation. I would like to collaborate with you on this if possible. Please let me know.

          Thanks
          Vikram.

          Show
          Vikram Dixit K added a comment - Amareshwari Sriramadasu I am quite interested in this jira and was wondering what phase you are in with respect to design/implementation. I would like to collaborate with you on this if possible. Please let me know. Thanks Vikram.
          Hide
          Amareshwari Sriramadasu added a comment -

          Vikram Dixit KI'm not working on it right now. May not get time in next one month also. Please feel free work on it, if interested.

          Show
          Amareshwari Sriramadasu added a comment - Vikram Dixit K I'm not working on it right now. May not get time in next one month also. Please feel free work on it, if interested.
          Hide
          Vikram Dixit K added a comment -

          The work required for this jira is fixed as part of de-emphasizing of map-join work done in HIVE-3784. The query

          {format}select /*+ MAPJOIN(b,c) */ from FACT a join DIM1 b on a.k1=b.k1 JOIN DIM2 c on b.k2=c.k2{format}

          runs in 1 MR job (based on the noConditionalTask.size).

          Show
          Vikram Dixit K added a comment - The work required for this jira is fixed as part of de-emphasizing of map-join work done in HIVE-3784 . The query {format}select /*+ MAPJOIN(b,c) */ from FACT a join DIM1 b on a.k1=b.k1 JOIN DIM2 c on b.k2=c.k2{format} runs in 1 MR job (based on the noConditionalTask.size).
          Hide
          Amareshwari Sriramadasu added a comment -

          When I ran the same query on the latest trunk with HIVE-3784 fixed, I see the following :

          explain select /*+ MAPJOIN(b,c) */ * from fact a join dim1 b on a.k1=b.k1 JOIN  dim2 c on a.k2=c.k2;
          FAILED: SemanticException [Error 10227]: Not all clauses are supported with mapjoin hint. Please remove mapjoin hint. 
          

          When I set hive.auto.convert.join=true; and run the following :

          explain select * from fact a join dim1 b on a.k1=b.k1 JOIN  dim2 c on a.k2=c.k2;
          
          STAGE DEPENDENCIES:
            Stage-10 is a root stage , consists of Stage-13, Stage-14, Stage-1
            Stage-13 has a backup stage: Stage-1
            Stage-8 depends on stages: Stage-13
            Stage-7 depends on stages: Stage-1, Stage-8, Stage-9 , consists of Stage-11, Stage-12, Stage-2
            Stage-11 has a backup stage: Stage-2
            Stage-5 depends on stages: Stage-11
            Stage-12 has a backup stage: Stage-2
            Stage-6 depends on stages: Stage-12
            Stage-2
            Stage-14 has a backup stage: Stage-1
            Stage-9 depends on stages: Stage-14
            Stage-1
            Stage-0 is a root stage
          
          

          And the above query launches two MR jobs. Correct me if i am doing anything wrong.

          Namit, Can you confirm if this is fixed in HIVE-3784 and is there any other way to run this?

          Vikram, If you are seeing this fixed, can you please add tests if no code changes are required?

          Show
          Amareshwari Sriramadasu added a comment - When I ran the same query on the latest trunk with HIVE-3784 fixed, I see the following : explain select /*+ MAPJOIN(b,c) */ * from fact a join dim1 b on a.k1=b.k1 JOIN dim2 c on a.k2=c.k2; FAILED: SemanticException [Error 10227]: Not all clauses are supported with mapjoin hint. Please remove mapjoin hint. When I set hive.auto.convert.join=true; and run the following : explain select * from fact a join dim1 b on a.k1=b.k1 JOIN dim2 c on a.k2=c.k2; STAGE DEPENDENCIES: Stage-10 is a root stage , consists of Stage-13, Stage-14, Stage-1 Stage-13 has a backup stage: Stage-1 Stage-8 depends on stages: Stage-13 Stage-7 depends on stages: Stage-1, Stage-8, Stage-9 , consists of Stage-11, Stage-12, Stage-2 Stage-11 has a backup stage: Stage-2 Stage-5 depends on stages: Stage-11 Stage-12 has a backup stage: Stage-2 Stage-6 depends on stages: Stage-12 Stage-2 Stage-14 has a backup stage: Stage-1 Stage-9 depends on stages: Stage-14 Stage-1 Stage-0 is a root stage And the above query launches two MR jobs. Correct me if i am doing anything wrong. Namit, Can you confirm if this is fixed in HIVE-3784 and is there any other way to run this? Vikram, If you are seeing this fixed, can you please add tests if no code changes are required?
          Hide
          Amareshwari Sriramadasu added a comment -

          Even with hive.auto.convert.join.noconditionaltask set to true, I'm seeing two MR jobs getting launched.

          Show
          Amareshwari Sriramadasu added a comment - Even with hive.auto.convert.join.noconditionaltask set to true, I'm seeing two MR jobs getting launched.
          Hide
          Namit Jain added a comment -

          Is your size threshold correct – hive.auto.convert.join.noconditionaltask.size ?

          Show
          Namit Jain added a comment - Is your size threshold correct – hive.auto.convert.join.noconditionaltask.size ?
          Hide
          Amareshwari Sriramadasu added a comment -

          Is your size threshold correct – hive.auto.convert.join.noconditionaltask.size ?

          Yes. The tables are very small. I tested with empty tables as well. I'm seeing the same behavior.

          Show
          Amareshwari Sriramadasu added a comment - Is your size threshold correct – hive.auto.convert.join.noconditionaltask.size ? Yes. The tables are very small. I tested with empty tables as well. I'm seeing the same behavior.
          Hide
          Vikram Dixit K added a comment -

          Hi Amareshwari,

          If you look at test case join32.q, it is almost the same as the one you had posted. It launches only one MR task (http://svn.apache.org/viewvc/hive/trunk/ql/src/test/results/clientpositive/join32.q.out?view=markup) I tried this with a fully installed cluster as well and I can see only one task. Another issue to consider would be HIVE-3996 and see if that makes a difference. Kindly correct me if I am wrong.

          Thanks
          Vikram.

          Show
          Vikram Dixit K added a comment - Hi Amareshwari, If you look at test case join32.q, it is almost the same as the one you had posted. It launches only one MR task ( http://svn.apache.org/viewvc/hive/trunk/ql/src/test/results/clientpositive/join32.q.out?view=markup ) I tried this with a fully installed cluster as well and I can see only one task. Another issue to consider would be HIVE-3996 and see if that makes a difference. Kindly correct me if I am wrong. Thanks Vikram.
          Hide
          Amareshwari Sriramadasu added a comment -

          Attaching test with .q and .out files, which is launching two MR jobs for star join queries.

          Show
          Amareshwari Sriramadasu added a comment - Attaching test with .q and .out files, which is launching two MR jobs for star join queries.
          Hide
          Amareshwari Sriramadasu added a comment -

          Seems I figured it out. The hive.auto.convert.join.noconditionaltask.size is not the number of rows. When i changed hive.auto.convert.join.noconditionaltask.size value in the attached tests, it is launching one MR job. Will upload the patch again to add tests.

          Show
          Amareshwari Sriramadasu added a comment - Seems I figured it out. The hive.auto.convert.join.noconditionaltask.size is not the number of rows. When i changed hive.auto.convert.join.noconditionaltask.size value in the attached tests, it is launching one MR job. Will upload the patch again to add tests.
          Hide
          Amareshwari Sriramadasu added a comment -

          Attaching the tests again. With hive.auto.convert.join.noconditionaltask.size increased, it launches single MR job for the queries.

          Show
          Amareshwari Sriramadasu added a comment - Attaching the tests again. With hive.auto.convert.join.noconditionaltask.size increased, it launches single MR job for the queries.

            People

            • Assignee:
              Vikram Dixit K
              Reporter:
              Amareshwari Sriramadasu
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development