Giraph
  1. Giraph
  2. GIRAPH-613

Remove Writable from the interfaces implemented by Vertex

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Currently, Vertex implements Writable, meaning that we depend on the user (except for the default implementation) on the way the graph is serialised and deserialized. This means we cannot do certain optimisations, and the user can actually add additional stuff to Vertex more than Value, Id, Edges and halt state. By removing this possibility (or allowing them to extend a default implementation) we take control of this.

      1. NonWritable.diff
        14 kB
        Claudio Martella
      2. GIRAPH-613.diff
        21 kB
        Claudio Martella
      3. GIRAPH-613.diff
        22 kB
        Claudio Martella

        Issue Links

          Activity

          Hide
          Maja Kabiljo added a comment - - edited

          Probably I'm missing something here - why is this better than the current state? Especially if we still allow user to create his own VertexWritable. And I don't see how this makes it clearer to the user that he shouldn't add fields to vertex.

          Ultimately we might want to remove Vertex completely, and have user implement some Computation interface, which would have a method like:
          compute(vertexId, vertexValue, edges, messages)
          This will really prevent user from adding additional fields directly to the vertex.
          Also, in many applications we have different kind of computation going on during different supersteps - have a lot of

           if (superstep == ..) { ... } else if (superstep == ..) ... 

          We use different messages during different supersteps too, and we have to hack it by putting a message type inside of MessageWritable. Having something like Computation instead of Vertex, and providing an option to switch Computation during application would make all of this much easier. But definitely not for the next release

          Show
          Maja Kabiljo added a comment - - edited Probably I'm missing something here - why is this better than the current state? Especially if we still allow user to create his own VertexWritable. And I don't see how this makes it clearer to the user that he shouldn't add fields to vertex. Ultimately we might want to remove Vertex completely, and have user implement some Computation interface, which would have a method like: compute(vertexId, vertexValue, edges, messages) This will really prevent user from adding additional fields directly to the vertex. Also, in many applications we have different kind of computation going on during different supersteps - have a lot of if (superstep == ..) { ... } else if (superstep == ..) ... We use different messages during different supersteps too, and we have to hack it by putting a message type inside of MessageWritable. Having something like Computation instead of Vertex, and providing an option to switch Computation during application would make all of this much easier. But definitely not for the next release
          Hide
          Claudio Martella added a comment -

          As a matter of fact, you're right, we should not allow at all the user to touch that part. I started it like that to make it a softer transition, but the more I think of it (and the more of you agree), the more I think it should not be possible completely.

          One example why I want to introduce this is that it would allow us to take control of the serialization and avoid writing back edges in the out-of-core partitions for computations where the adjacencies don't change, with support of a user parameter (giraph.StaticGraph). As most of the algorithms and analytics out there do not mutate the graph, it would be a big win.

          Does this make more sense now? If I have your +1, I'll remove the possibility for the user to implement his own VertexWritable class.

          I like what you propose, it would also save A LOT OF MEMORY.

          Show
          Claudio Martella added a comment - As a matter of fact, you're right, we should not allow at all the user to touch that part. I started it like that to make it a softer transition, but the more I think of it (and the more of you agree), the more I think it should not be possible completely. One example why I want to introduce this is that it would allow us to take control of the serialization and avoid writing back edges in the out-of-core partitions for computations where the adjacencies don't change, with support of a user parameter (giraph.StaticGraph). As most of the algorithms and analytics out there do not mutate the graph, it would be a big win. Does this make more sense now? If I have your +1, I'll remove the possibility for the user to implement his own VertexWritable class. I like what you propose, it would also save A LOT OF MEMORY.
          Hide
          Maja Kabiljo added a comment -

          That way it makes much more sense. And for now we should at least add something to Vertex javadoc to state that only id, value and edges will be serialized, and that user should put all the data he needs in vertexValue.

          Are you already working on this out-of-core improvements and will have them ready for release or not?

          Show
          Maja Kabiljo added a comment - That way it makes much more sense. And for now we should at least add something to Vertex javadoc to state that only id, value and edges will be serialized, and that user should put all the data he needs in vertexValue. Are you already working on this out-of-core improvements and will have them ready for release or not?
          Hide
          Claudio Martella added a comment -

          I'm doing one thing at a time, but that should not take too much time. (1) de-couple storage of edges and vertex value (2) ignore writing back to disk edges if parameter is set. Hopefully I'll make it on time for the release.

          Show
          Claudio Martella added a comment - I'm doing one thing at a time, but that should not take too much time. (1) de-couple storage of edges and vertex value (2) ignore writing back to disk edges if parameter is set. Hopefully I'll make it on time for the release.
          Hide
          Maja Kabiljo added a comment -

          Ok then, unless somebody else has arguments against, I'm +1 on the idea.

          Show
          Maja Kabiljo added a comment - Ok then, unless somebody else has arguments against, I'm +1 on the idea.
          Hide
          Claudio Martella added a comment -

          passes verify and text both local and destributed.

          Show
          Claudio Martella added a comment - passes verify and text both local and destributed.
          Hide
          Claudio Martella added a comment -

          Actually, I would have like to test it with OOC and checkpointing, but I don't seam to understand how to pass these parameters to PageRankBenchmark since its refactor. Any ideas?

          Show
          Claudio Martella added a comment - Actually, I would have like to test it with OOC and checkpointing, but I don't seam to understand how to pass these parameters to PageRankBenchmark since its refactor. Any ideas?
          Hide
          Maja Kabiljo added a comment -

          Refactor was really just a refactor, so it's done the same way as before - first pass the -D arguments and then the benchmark specific arguments (-w, -V ...)

          Show
          Maja Kabiljo added a comment - Refactor was really just a refactor, so it's done the same way as before - first pass the -D arguments and then the benchmark specific arguments (-w, -V ...)
          Hide
          Claudio Martella added a comment -

          Makes sense. Tested also with PageRankBenchmark with both checkpointing and OOC and it's safe.
          I'm working on the rest of the idea now.

          Show
          Claudio Martella added a comment - Makes sense. Tested also with PageRankBenchmark with both checkpointing and OOC and it's safe. I'm working on the rest of the idea now.
          Hide
          Maja Kabiljo added a comment -

          Looks good overall, apart from one concern - with ByteArrayPartition we are not able to reuse objects (vertex id, value, edges) anymore. One way to fix this would be to add additional read method to VertexWritable which will reuse them, or have the current one check some argument from the configuration to decide whether or not to reuse them, I think that should work.

          Show
          Maja Kabiljo added a comment - Looks good overall, apart from one concern - with ByteArrayPartition we are not able to reuse objects (vertex id, value, edges) anymore. One way to fix this would be to add additional read method to VertexWritable which will reuse them, or have the current one check some argument from the configuration to decide whether or not to reuse them, I think that should work.
          Hide
          Claudio Martella added a comment -

          I'm not sure where we reuse objects in the older code, and were I broke it. Can you indicate it?

          Show
          Claudio Martella added a comment - I'm not sure where we reuse objects in the older code, and were I broke it. Can you indicate it?
          Hide
          Maja Kabiljo added a comment -

          It's not exactly that you broke it, but earlier we could have our vertex implementation which reuses objects in readFields, which we can't do anymore. At least I think that was the original idea. Avery Ching?

          Show
          Maja Kabiljo added a comment - It's not exactly that you broke it, but earlier we could have our vertex implementation which reuses objects in readFields, which we can't do anymore. At least I think that was the original idea. Avery Ching ?
          Hide
          Claudio Martella added a comment -

          Ok, I see. Nonetheless, it would be nice to have a Factory/Cache that does it under the hood, instead on a per-implementation.

          Can I commit this?

          Show
          Claudio Martella added a comment - Ok, I see. Nonetheless, it would be nice to have a Factory/Cache that does it under the hood, instead on a per-implementation. Can I commit this?
          Hide
          Avery Ching added a comment -

          Maja Kabiljo, you have a great memory. As you mentioned, when we moved to VertexEdges, we lost the reusability of objects, which costs us gc time. Would be great to have this back Claudio.

          Show
          Avery Ching added a comment - Maja Kabiljo , you have a great memory. As you mentioned, when we moved to VertexEdges, we lost the reusability of objects, which costs us gc time. Would be great to have this back Claudio.
          Hide
          Claudio Martella added a comment -

          So, it's not this patch that broke it, and it's not directly connected . So I'd commit this, as it is a blocker for GIRAPH-616, and we can consider the issue later?

          Show
          Claudio Martella added a comment - So, it's not this patch that broke it, and it's not directly connected . So I'd commit this, as it is a blocker for GIRAPH-616 , and we can consider the issue later?
          Hide
          Maja Kabiljo added a comment -

          The patch makes it impossible to reuse objects when using ByteArrayPartition, and that is possible with the current state. I'm not for committing this until that's fixed.
          Easy thing to do is just not use VertexWritable inside of ByteArrayPartition, and reuse object directly there. Also, the name VertexWritable is a bit misleading, since that's not a vertex class, it only holds utility methods.

          Show
          Maja Kabiljo added a comment - The patch makes it impossible to reuse objects when using ByteArrayPartition, and that is possible with the current state. I'm not for committing this until that's fixed. Easy thing to do is just not use VertexWritable inside of ByteArrayPartition, and reuse object directly there. Also, the name VertexWritable is a bit misleading, since that's not a vertex class, it only holds utility methods.
          Hide
          Maja Kabiljo added a comment -

          I thought our distributed tests are broken?

          Show
          Maja Kabiljo added a comment - I thought our distributed tests are broken?
          Hide
          Claudio Martella added a comment -

          Honestly, I do not understand your reasoning. The feature you mention was used only at FB, as it is not in the Giraph code currently, and it was broken by another patch (which ironically was contributed by FB itself). So it is in no way connected to this patch.
          Now, you ask me to introduce a fix for that issue in this patch, and most importantly you ask me to introduce the fix for a feature I don't see around in the code because it's only in the code you have at FB. So, I do not see how I can help you even if it made sense it to be in this JIRA, as I don't have code that is using that feature to work with, so how could I implement it?

          The only thing that makes sense to me, in particular a couple of days before the release of 1.0, is that this code goes in and the other code is contributed in another issue by somebody who actually uses it and has a clear idea of what is needed. Don't you think? Moreover, if this code does not go in, you remain without your feature anyway, right?

          Btw, the distributed test now does not have the issues it had before that I experienced a few months ago, apparently it was fixed by one of the issues contributed so far. What is still broken is testContinue in TextJsonBase64Format, which this code did not pass either, but it is known to be broken for a long time (in fact it is broken also in trunk, I checked this morning).

          As far as VertexWritable, it does not mean that it's a vertex (it would be WritableVertex), but it was called like that to emphasize the Writable interface. But I agree, it's not a great name. What I'm going to do is take those two methods and move them to WritableUtils.

          Show
          Claudio Martella added a comment - Honestly, I do not understand your reasoning. The feature you mention was used only at FB, as it is not in the Giraph code currently, and it was broken by another patch (which ironically was contributed by FB itself). So it is in no way connected to this patch. Now, you ask me to introduce a fix for that issue in this patch, and most importantly you ask me to introduce the fix for a feature I don't see around in the code because it's only in the code you have at FB. So, I do not see how I can help you even if it made sense it to be in this JIRA, as I don't have code that is using that feature to work with, so how could I implement it? The only thing that makes sense to me, in particular a couple of days before the release of 1.0, is that this code goes in and the other code is contributed in another issue by somebody who actually uses it and has a clear idea of what is needed. Don't you think? Moreover, if this code does not go in, you remain without your feature anyway, right? Btw, the distributed test now does not have the issues it had before that I experienced a few months ago, apparently it was fixed by one of the issues contributed so far. What is still broken is testContinue in TextJsonBase64Format, which this code did not pass either, but it is known to be broken for a long time (in fact it is broken also in trunk, I checked this morning). As far as VertexWritable, it does not mean that it's a vertex (it would be WritableVertex), but it was called like that to emphasize the Writable interface. But I agree, it's not a great name. What I'm going to do is take those two methods and move them to WritableUtils.
          Hide
          Maja Kabiljo added a comment -

          I am not sure which patch you are referring to, and I wouldn't say that the feature is currently broken, it just needs to be implemented in the Vertex implementation (from readFields method). Maybe I am missing something, but I don't see how it can be implemented from outside of giraph core with this change. But sure, one of us can implement it, I think it's a really short change and was trying to give you pointers about how to do it. Reusing Vertex object is already there, we just need to reuse id, value and edges as well.

          Moving these methods to WritableUtils sounds good.

          I remember one of the checkpoint distributed tests was also broken, glad to hear it's fixed

          Show
          Maja Kabiljo added a comment - I am not sure which patch you are referring to, and I wouldn't say that the feature is currently broken, it just needs to be implemented in the Vertex implementation (from readFields method). Maybe I am missing something, but I don't see how it can be implemented from outside of giraph core with this change. But sure, one of us can implement it, I think it's a really short change and was trying to give you pointers about how to do it. Reusing Vertex object is already there, we just need to reuse id, value and edges as well. Moving these methods to WritableUtils sounds good. I remember one of the checkpoint distributed tests was also broken, glad to hear it's fixed
          Hide
          Claudio Martella added a comment -

          Then I must have misunderstood what you meant when you wrote that earlier you had vertex implementation that could reuse objects and that you can't reuse it anymore, and when Avery mentioned that since VertexEdges you cannot (that's the patch I was talking about), and that you have a good memory. I interpret it as you cannot do that right now.

          Again, it shows that I don't really understood what you need and how to help you get it done. I'd be happy to include a small fix, that is really not my point. My point is that re-reading your suggestion about adding another method to re-use objects (or extend the current one with configuration), I do not really understand what kind of code I'd have to add precisely. Can you be more specific and maybe I can get done this fix? What objects exactly do you want to be reused, and reused between what instances?

          Show
          Claudio Martella added a comment - Then I must have misunderstood what you meant when you wrote that earlier you had vertex implementation that could reuse objects and that you can't reuse it anymore, and when Avery mentioned that since VertexEdges you cannot (that's the patch I was talking about), and that you have a good memory. I interpret it as you cannot do that right now. Again, it shows that I don't really understood what you need and how to help you get it done. I'd be happy to include a small fix, that is really not my point. My point is that re-reading your suggestion about adding another method to re-use objects (or extend the current one with configuration), I do not really understand what kind of code I'd have to add precisely. Can you be more specific and maybe I can get done this fix? What objects exactly do you want to be reused, and reused between what instances?
          Hide
          Maja Kabiljo added a comment -

          I'm sorry for the confusion. There was ByteArrayVertex (or some similar name) which was reusing objects by itself, so we could just extend that class and not worry about it. I agree that with VertexEdges patch we should have put something that has this functionality, but I wasn't the one reviewing that code Anyway, even with that patch as it is, it is possible to reuse all objects, just not as easy as it was before (instead of just extending ByteArrayVertex we now have to implement readFields method).

          What I'm proposing is inside of ByteArrayPartition, wherever we call readFieldsFromByteArrayWithSize, to use another method, where we would replace

           writableObject.readFields(extendedDataInput); 

          with something like

          vertex.getId().readFields(extendedDataInput);
          vertex.getValue().readFields(extendedDataInput);
          ((VertexEdges<I, E>) vertex.getEdges()).readFields(extendedDataInput);
          if (extendedDataInput.readBoolean()) {
            vertex.wakeUp();
          } else {
            vertex.voteToHalt();
          }
          

          And we'll have to create all those objects inside of ByteArrayPartition.initialize. I think this should work, I am not sure is there any test which uses ByteArrayPartition, if not you can just try it out on some of your applications by setting giraph.partitionClass.

          Show
          Maja Kabiljo added a comment - I'm sorry for the confusion. There was ByteArrayVertex (or some similar name) which was reusing objects by itself, so we could just extend that class and not worry about it. I agree that with VertexEdges patch we should have put something that has this functionality, but I wasn't the one reviewing that code Anyway, even with that patch as it is, it is possible to reuse all objects, just not as easy as it was before (instead of just extending ByteArrayVertex we now have to implement readFields method). What I'm proposing is inside of ByteArrayPartition, wherever we call readFieldsFromByteArrayWithSize, to use another method, where we would replace writableObject.readFields(extendedDataInput); with something like vertex.getId().readFields(extendedDataInput); vertex.getValue().readFields(extendedDataInput); ((VertexEdges<I, E>) vertex.getEdges()).readFields(extendedDataInput); if (extendedDataInput.readBoolean()) { vertex.wakeUp(); } else { vertex.voteToHalt(); } And we'll have to create all those objects inside of ByteArrayPartition.initialize. I think this should work, I am not sure is there any test which uses ByteArrayPartition, if not you can just try it out on some of your applications by setting giraph.partitionClass.
          Hide
          Claudio Martella added a comment -

          Finally I understand it, I totally get your point now (which is a good one). I'll see what I can do. Thanks.

          Show
          Claudio Martella added a comment - Finally I understand it, I totally get your point now (which is a good one). I'll see what I can do. Thanks.
          Hide
          Claudio Martella added a comment -

          passes mvn verify and distributed tests.
          ran PR with ByteArrayPartition.

          Show
          Claudio Martella added a comment - passes mvn verify and distributed tests. ran PR with ByteArrayPartition.
          Hide
          Maja Kabiljo added a comment -

          Awesome, thanks! One question: where are the objects for id, value and edges of representativeVertex in ByteArrayParttion created? Don't we have to do something like:

           representativeVertex.initialize(getConf().createVertexId(), getConf().createVertexValue()); 

          And I'd replace the 'checkpoint' word from Vertex javadoc with 'store' or something like that, because checkpoint is not the only thing which will break if user tries to use fields from Vertex (vertices are also serialized during input superstep, if ByteArrayPartition or out-of-core graph is used).

          Other than that, +1.

          Show
          Maja Kabiljo added a comment - Awesome, thanks! One question: where are the objects for id, value and edges of representativeVertex in ByteArrayParttion created? Don't we have to do something like: representativeVertex.initialize(getConf().createVertexId(), getConf().createVertexValue()); And I'd replace the 'checkpoint' word from Vertex javadoc with 'store' or something like that, because checkpoint is not the only thing which will break if user tries to use fields from Vertex (vertices are also serialized during input superstep, if ByteArrayPartition or out-of-core graph is used). Other than that, +1.
          Hide
          Claudio Martella added a comment -

          Ooops, I must have uploaded the wrong version of my patch. Sorry about that. I'll fix this and commit. Thanks.

          Show
          Claudio Martella added a comment - Ooops, I must have uploaded the wrong version of my patch. Sorry about that. I'll fix this and commit. Thanks.
          Hide
          Hudson added a comment -

          Integrated in Giraph-trunk-Commit #911 (See https://builds.apache.org/job/Giraph-trunk-Commit/911/)
          GIRAPH-613 (Revision 6cf79dbe2c4397732820c435df723fe50e9f3daf)
          GIRAPH-613 (Revision 67e0f11d2bd98e0614b26d4ea6dd707e6f65a105)
          GIRAPH-613 (Revision ff8d98e20a4b2475f5890ade5ee78dc8d804420c)

          Result = SUCCESS
          claudio : http://git-wip-us.apache.org/repos/asf?p=giraph.git&a=commit&h=6cf79dbe2c4397732820c435df723fe50e9f3daf
          Files :

          • giraph-core/src/main/java/org/apache/giraph/partition/DiskBackedPartitionStore.java
          • giraph-core/src/main/java/org/apache/giraph/conf/GiraphClasses.java
          • CHANGELOG
          • giraph-core/src/main/java/org/apache/giraph/partition/SimplePartition.java
          • giraph-core/src/test/java/org/apache/giraph/graph/TestVertexAndEdges.java
          • giraph-core/src/main/java/org/apache/giraph/graph/Vertex.java
          • giraph-core/src/main/java/org/apache/giraph/partition/ByteArrayPartition.java
          • giraph-core/src/main/java/org/apache/giraph/graph/VertexMutations.java
          • giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java

          claudio : http://git-wip-us.apache.org/repos/asf?p=giraph.git&a=commit&h=67e0f11d2bd98e0614b26d4ea6dd707e6f65a105
          Files :

          • giraph-core/src/main/java/org/apache/giraph/partition/ByteArrayPartition.java
          • giraph-core/src/main/java/org/apache/giraph/graph/Vertex.java

          claudio : http://git-wip-us.apache.org/repos/asf?p=giraph.git&a=commit&h=ff8d98e20a4b2475f5890ade5ee78dc8d804420c
          Files :

          • giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java
          • giraph-core/src/main/java/org/apache/giraph/partition/ByteArrayPartition.java
          Show
          Hudson added a comment - Integrated in Giraph-trunk-Commit #911 (See https://builds.apache.org/job/Giraph-trunk-Commit/911/ ) GIRAPH-613 (Revision 6cf79dbe2c4397732820c435df723fe50e9f3daf) GIRAPH-613 (Revision 67e0f11d2bd98e0614b26d4ea6dd707e6f65a105) GIRAPH-613 (Revision ff8d98e20a4b2475f5890ade5ee78dc8d804420c) Result = SUCCESS claudio : http://git-wip-us.apache.org/repos/asf?p=giraph.git&a=commit&h=6cf79dbe2c4397732820c435df723fe50e9f3daf Files : giraph-core/src/main/java/org/apache/giraph/partition/DiskBackedPartitionStore.java giraph-core/src/main/java/org/apache/giraph/conf/GiraphClasses.java CHANGELOG giraph-core/src/main/java/org/apache/giraph/partition/SimplePartition.java giraph-core/src/test/java/org/apache/giraph/graph/TestVertexAndEdges.java giraph-core/src/main/java/org/apache/giraph/graph/Vertex.java giraph-core/src/main/java/org/apache/giraph/partition/ByteArrayPartition.java giraph-core/src/main/java/org/apache/giraph/graph/VertexMutations.java giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java claudio : http://git-wip-us.apache.org/repos/asf?p=giraph.git&a=commit&h=67e0f11d2bd98e0614b26d4ea6dd707e6f65a105 Files : giraph-core/src/main/java/org/apache/giraph/partition/ByteArrayPartition.java giraph-core/src/main/java/org/apache/giraph/graph/Vertex.java claudio : http://git-wip-us.apache.org/repos/asf?p=giraph.git&a=commit&h=ff8d98e20a4b2475f5890ade5ee78dc8d804420c Files : giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java giraph-core/src/main/java/org/apache/giraph/partition/ByteArrayPartition.java

            People

            • Assignee:
              Claudio Martella
              Reporter:
              Claudio Martella
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development