-
Type:
Sub-task
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: None
-
Labels:None
Once a mapper in the MR index job succeeds, it should not need to be re-done in the event of the failure of one of the other mappers. The initial population of an index is based on a snapshot in time, so new rows getting after the index build has started and/or failed do not impact it.
Also, there's a 1:1 correspondence between index rows and table rows, so there's really no need to dedup. However, the index rows will have a different row key than the data table, so I'm not sure how the HFiles are split. Will they potentially overlap and is this an issue?