Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-6273

Add support to handle MR Snapshot restore externally

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 5.0.0, 4.14.3
    • Fix Version/s: 5.1.0, 4.16.0
    • Component/s: core
    • Labels:
      None
    • Release Note:
      Hide
      Adds mapreduce configuration param "phoenix.mapreduce.external.snapshot.restore" which when set to true indicates that snapshot-based MapReduce jobs shouldn't try to restore the snapshot themselves, but assume an external application has already done so.
      Show
      Adds mapreduce configuration param "phoenix.mapreduce.external.snapshot.restore" which when set to true indicates that snapshot-based MapReduce jobs shouldn't try to restore the snapshot themselves, but assume an external application has already done so.

      Description

      Recently we switched an MR application from scanning live tables to scanning snapshots (PHOENIX-3744). We ran into a severe performance issue, which turned out to a correctness issue due to over-lapping scan splits generation. After some debugging we figured that it has been fixed via PHOENIX-4997

      We also need not restore the snapshot per map task. Currently, we restore the snapshot once per map task into a temp directory. For large tables on big clusters, this creates a storm of NN RPCs. We can do this once per job and let all the map tasks operate on the same restored snapshot. HBase already did this via HBASE-18806, we can do something similar. Jira to correct this behavior: https://issues.apache.org/jira/browse/PHOENIX-6334

      The purpose of this Jira is to resolve this issue immediately by providing the ability to the caller to decide whether or not snapshot restore needs to be handled externally or internally on the Phoenix side (the buggy approach). 

      All other performance suggestions here: https://issues.apache.org/jira/browse/PHOENIX-6081

        Attachments

          Activity

            People

            • Assignee:
              saksham.gangwar Saksham Gangwar
              Reporter:
              saksham.gangwar Saksham Gangwar

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment