Uploaded image for project: 'Giraph (Retired)'
  1. Giraph (Retired)
  2. GIRAPH-684

Improve Writable API

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      While working on GIRAPH-683 I realized something: The python code the user has to write is fairly cumbersome, because they cant just say setValue(4), they have to say setValue(IntWritable(4)). This is incredibly ugly in my opinion.

      The problem is that we have a tight coupling between user types and their serialization, so the "everything must be Writable" spreads throughout the codebase.

      I think we need to change e.g. Vertex<I extends WritableComparable, V extends Writable, E extends Writable> to just Vertex<I, V, E>.

      We store for each type a SerDe that knows how to serialize/deserialize that type. If the user passes us a Writable then we use a WritableSerDe. This means no changes required to existing code.

      Note that the SerDe interface does not allow for using a type like Long directly. This is by design since immutable types don't work with Giraph.

      The I,V,E,M parameters, in order to get serialized, would need to adhere to one of the following:
      1) Be a type we know how to serialize, e.g. LongWritable.
      2) Be Writable. The key is we don't require it on the generic parameter, but we check if it is and if so we use their code. This makes everything backwards compatible.
      3) The user has registered his own serializer. This lets them serialize completely new types, for example a fastutil map, without having to subclass that type to make it Writable.

      With this improved API in place, all computation code (and user code in general) would be much cleaner and simpler. It will also make things like Jython much more intuitive.

      I ran PageRankBenchmark with this diff using 100M vertices, 10B edges, and 10 workers. The change is insignificant: 319 seconds total time vs 311. The new version is actually faster (but I think that is mostly just variance noise).

      Here is the code: https://reviews.apache.org/r/13306/

      Attachments

        Activity

          People

            nitay Nitay Joffe
            nitay Nitay Joffe
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: