Uploaded image for project: 'Giraph (Retired)'
  1. Giraph (Retired)
  2. GIRAPH-307

InputSplit list can be long with many workers (and locality info) and should not be re-created every time a worker calls reserveInputSplit()

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.0.0
    • 1.0.0
    • bsp, graph
    • None

    Description

      While instrumenting the INPUT_SUPERSTEP and watching various runs, I see the input split list generated every time a worker calls reserveInputSplit is, for all intents and purposes, immutable per job. Therefore, we can save a fair amount of memory by not re-creating the list and re-querying ZooKeeper on each pass to claim another split. Only the reserved and finished children lists are ever mutated during the input phase of the job.

      Attachments

        1. GIRAPH-307-3.patch
          19 kB
          Eli Reisman
        2. GIRAPH-307-2.patch
          20 kB
          Eli Reisman
        3. GIRAPH-307-1.patch
          6 kB
          Eli Reisman

        Activity

          People

            initialcontext Eli Reisman
            initialcontext Eli Reisman
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: