Description
In MR, all small tables for a particular MJ operator share the same instance of HashTableSinkOperator, while in Spark branch, each small table corresponds to a different HashTableSinkOperator instance. This difference causes some issues.
For instance, in HashTableSinkOperator#processOp, it uses a tag to look for information in various data structures, such as joinKeys, filterMaps, joinValues, etc. Those data structures stores the information BEFORE it splits the MJ operator with its parents. But, since later on we use separate HashTableSinkOperator for each small table, that information is no longer valid, and thus this method will fail.
This JIRA is to track and solve these related issues.