Uploaded image for project: 'Giraph (Retired)'
  1. Giraph (Retired)
  2. GIRAPH-1136

Giraph no longer checkpoints after loading input

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.2.0
    • None
    • bsp
    • None

    Description

      From the Worker Failure section in Chapter 6 of Practical Analytics with Apache Giraph:

      If checkpoints are enabled, you are guaranteed to have the initial graph data safely stored in HDFS.

      Looking at the current code in o.a.g.master.BspServiceMaster:getCheckpointStatus, it looks like this is no longer the case. After a little bit of digging, it looks like this change was introduced here: https://github.com/apache/giraph/commit/5adca63deca25d84f4fdea053c35a85efc8bbb3d#diff-e16fdec9e3f573eba64cfe6eab512e19L657 (GIRAPH-933)

      Was this change intentional? It seems like an initial checkpoint would be desirable (I have a job where 1/3 of the runtime is spent loading input splits).

      The simplest fix would be to just add a special case for Superstep 0. If that's acceptable, I'd be happy to submit a PR.

      Attachments

        Activity

          People

            Unassigned Unassigned
            nseggert Nic Eggert
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: