[PHOENIX-6334] All map tasks should operate on the same restored snapshot - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 5.0.0, 4.14.3
Fix Version/s: None
Component/s: core
Labels:
None

Description

Recently we switched an MR application from scanning live tables to scanning snapshots (~~PHOENIX-3744~~). We ran into a severe performance issue, which turned out to a correctness issue due to over-lapping scan splits generation. After some debugging we figured that it has been fixed via ~~PHOENIX-4997~~.

We also need not restore the snapshot per map task. The purpose of this Jira is to correct that behavior. Currently, we restore the snapshot once per map task into a temp directory. For large tables on big clusters, this creates a storm of NN RPCs. We can do this once per job and let all the map tasks operate on the same restored snapshot. HBase already did this via ~~HBASE-18806~~, we can do something similar.

All other performance suggestions here: https://issues.apache.org/jira/browse/PHOENIX-6081

Attachments

Activity

People

Assignee:: Rushabh Shah

Reporter:: Saksham Gangwar

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 21/Jan/21 10:15

Updated:: 25/Feb/21 02:33