Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-6334

All map tasks should operate on the same restored snapshot

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 5.0.0, 4.14.3
    • Fix Version/s: None
    • Component/s: core
    • Labels:
      None

      Description

      Recently we switched an MR application from scanning live tables to scanning snapshots (PHOENIX-3744). We ran into a severe performance issue, which turned out to a correctness issue due to over-lapping scan splits generation. After some debugging we figured that it has been fixed via PHOENIX-4997

      We also need not restore the snapshot per map task. The purpose of this Jira is to correct that behavior. Currently, we restore the snapshot once per map task into a temp directory. For large tables on big clusters, this creates a storm of NN RPCs. We can do this once per job and let all the map tasks operate on the same restored snapshot. HBase already did this via HBASE-18806, we can do something similar.

       

      All other performance suggestions here: https://issues.apache.org/jira/browse/PHOENIX-6081

       

        Attachments

          Activity

            People

            • Assignee:
              shahrs87 Rushabh Shah
              Reporter:
              saksham.gangwar Saksham Gangwar
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: