Mahasa asked me the following:::
In the following query:
SELECT col_list FROM A JOIN B ON (A.col1 = B.col1)
1. Since all tables are buffered except for the last one which is streamed, it is only table B that can make use of the index, am I right?
>>>>> No, indexes can be used for both the tables if there are filters on any of the tables. It would be after the tablescan for either A or B.
2. In order to do this, in the mapper, the TS should be done on the index table rather than the base table; what about the reduce stage? Don't you need to have access to base table/index table in the reduce phase too? For applying SEL?
>>>>> Yes, the TS would be on the index table for either A or B. There would be no change after that – no change in the reduce phase.
3. As far as I know, filter pushdown and group by use indexes to accelerate the query. Filter pushdown recompiles the re-written query whereas GB only replaces appropriate operators of the operator tree. Which one is more suitable to be inspired to implement HIVE-2845?
>>>>> HIVE-2845 requires new changes. Essentially, one of the tables, say A would be read completely, and the other one, B would be probed for each key of A,
or vice versa.
4. May I ask to assign this ticket to me?
>>>>> Yes, I dont think anyone is working on it right now.