Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: 3.2.4
    • Fix Version/s: 3.2.6
    • Component/s: io
    • Labels:

      Description

      Not sure what will happen here. Just wanted an issue where I could experiment a bit with and gather feedback from others for collaboration in case others had ideas. Right now, GraphSON reads are considerably slower than Gryo and writes appear to be faster, so the focus will be on how to get reads more in line with what Gryo can do.

        Issue Links

          Activity

          Hide
          spmallette stephen mallette added a comment -

          This was largely completed on 3.2.5 but I found one additional improvement that could be made which I CTR'd in. If there is more work to do here, we can create new more specific tickets.

          Show
          spmallette stephen mallette added a comment - This was largely completed on 3.2.5 but I found one additional improvement that could be made which I CTR'd in. If there is more work to do here, we can create new more specific tickets.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/tinkerpop/pull/613

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tinkerpop/pull/613
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/tinkerpop/pull/614

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tinkerpop/pull/614
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user okram commented on the issue:

          https://github.com/apache/tinkerpop/pull/614

          VOTE +1

          Show
          githubbot ASF GitHub Bot added a comment - Github user okram commented on the issue: https://github.com/apache/tinkerpop/pull/614 VOTE +1
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user okram commented on the issue:

          https://github.com/apache/tinkerpop/pull/613

          VOTE +1

          Show
          githubbot ASF GitHub Bot added a comment - Github user okram commented on the issue: https://github.com/apache/tinkerpop/pull/613 VOTE +1
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user spmallette commented on the issue:

          https://github.com/apache/tinkerpop/pull/613

          yeah - never sure what to do in those situations. i'm having some pause as i think about this PR this week. let's consider it and merge it as-is and i'll see what i might do to allow the "old way" to still hang around so that if someone is depending on it, they can still do it.

          Show
          githubbot ASF GitHub Bot added a comment - Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/613 yeah - never sure what to do in those situations. i'm having some pause as i think about this PR this week. let's consider it and merge it as-is and i'll see what i might do to allow the "old way" to still hang around so that if someone is depending on it, they can still do it.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dkuppitz commented on the issue:

          https://github.com/apache/tinkerpop/pull/614

          VOTE: +1

          Show
          githubbot ASF GitHub Bot added a comment - Github user dkuppitz commented on the issue: https://github.com/apache/tinkerpop/pull/614 VOTE: +1
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dkuppitz commented on the issue:

          https://github.com/apache/tinkerpop/pull/613

          Not sure if this should really go into `tp32` as it's kinda breaking (though it breaks something that was broken ), but if others agree, then

          VOTE: +1

          Show
          githubbot ASF GitHub Bot added a comment - Github user dkuppitz commented on the issue: https://github.com/apache/tinkerpop/pull/613 Not sure if this should really go into `tp32` as it's kinda breaking (though it breaks something that was broken ), but if others agree, then VOTE: +1
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user spmallette opened a pull request:

          https://github.com/apache/tinkerpop/pull/614

          TINKERPOP-1676 (master) GraphSON Performance

          https://issues.apache.org/jira/browse/TINKERPOP-1676

          Had to do a separate PR for this as we've made changes in testing IO on master - see #613 for more details on what this PR is about.

          Running the full build now - will edit this PR when it's done successfully.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/apache/tinkerpop TINKERPOP-1676-master

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/tinkerpop/pull/614.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #614


          commit 46e6e9767b6818f33b56366ee15d80cec5a908e0
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-05-23T20:03:17Z

          TINKERPOP-1676 Got rid of stream() usage

          Can't believe we still had stream() in here. Will it ever all be gone from these performance sensitive places?! Anyway, removed that and deprecated a constructor on DetachedEdge that was using Pair for no really good reason. No need to create extra Pair objects for that. They just sorta get thrown away after usage.

          commit 251f5b7e34e8ebd9a8bc36802e633be8c91eeb5e
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-05-23T20:57:12Z

          TINKERPOP-1676 Improved speed and memory usage of GraphSON

          This change is specific to TinkerGraph and the serialization of vertices/edges/properties. Removed a "middle layer" of JSON to Object serialization which was coercing JSON to Map first and then converting that to the graph elements. Also made deserializers cacheable.

          commit 02b007366b434918fe1181f99d80689c1c03684b
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-05-24T00:13:46Z

          TINKERPOP-1676 Performance enhancement to graphson serialization

          Focuses on speeding up serialization of graph elements. Prevent use of generic maps and stream graphson data directly into "detached" elements. Not using intermediate maps cut down on memory usage and a bunch of jackson reflection calls (still not sure what they were for and why they were not made to be more efficient). It did mean some ugly changes to "detached" stuff. Will need to maybe refactor some more, but the basic premise seems to be proven.

          commit e5d2d2bc000654fd026521219cf30bac265139a3
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-05-24T17:38:16Z

          TINKERPOP-1676 Removed properties from graphson serialization of Path

          Path should not have properties on elements (if they are present). That is inconsistent with gryo and was unintentially allowed.

          commit 049c979720a84ea9744725fda5f8d2b371ab1751
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-05-25T17:29:45Z

          TINKERPOP-1676 More optimizations to GraphSON serializers

          Added some tests where there were previously gaps.

          commit f32d725a080d39ef7bf93c68d8080939f622cb37
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-05-25T18:03:29Z

          TINKERPOP-1676 Cleaned up serialization with detached

          The DetachedUtil was just there to test out if the performance changes would help and it did it's job nicely, but it was kinda ugly and hung some methods out there in weird way. Cleaned that up with some builder pattern on the detached classes themselves.

          commit bb7ff8290962e30dd4d78c4bd904eef31a3f2478
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-05-25T19:25:32Z

          Merge branch 'TINKERPOP-1676' into TINKERPOP-1676-master

          Conflicts:
          docs/src/dev/io/graphson.asciidoc


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user spmallette opened a pull request: https://github.com/apache/tinkerpop/pull/614 TINKERPOP-1676 (master) GraphSON Performance https://issues.apache.org/jira/browse/TINKERPOP-1676 Had to do a separate PR for this as we've made changes in testing IO on master - see #613 for more details on what this PR is about. Running the full build now - will edit this PR when it's done successfully. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/tinkerpop TINKERPOP-1676 -master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tinkerpop/pull/614.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #614 commit 46e6e9767b6818f33b56366ee15d80cec5a908e0 Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-05-23T20:03:17Z TINKERPOP-1676 Got rid of stream() usage Can't believe we still had stream() in here. Will it ever all be gone from these performance sensitive places?! Anyway, removed that and deprecated a constructor on DetachedEdge that was using Pair for no really good reason. No need to create extra Pair objects for that. They just sorta get thrown away after usage. commit 251f5b7e34e8ebd9a8bc36802e633be8c91eeb5e Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-05-23T20:57:12Z TINKERPOP-1676 Improved speed and memory usage of GraphSON This change is specific to TinkerGraph and the serialization of vertices/edges/properties. Removed a "middle layer" of JSON to Object serialization which was coercing JSON to Map first and then converting that to the graph elements. Also made deserializers cacheable. commit 02b007366b434918fe1181f99d80689c1c03684b Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-05-24T00:13:46Z TINKERPOP-1676 Performance enhancement to graphson serialization Focuses on speeding up serialization of graph elements. Prevent use of generic maps and stream graphson data directly into "detached" elements. Not using intermediate maps cut down on memory usage and a bunch of jackson reflection calls (still not sure what they were for and why they were not made to be more efficient). It did mean some ugly changes to "detached" stuff. Will need to maybe refactor some more, but the basic premise seems to be proven. commit e5d2d2bc000654fd026521219cf30bac265139a3 Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-05-24T17:38:16Z TINKERPOP-1676 Removed properties from graphson serialization of Path Path should not have properties on elements (if they are present). That is inconsistent with gryo and was unintentially allowed. commit 049c979720a84ea9744725fda5f8d2b371ab1751 Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-05-25T17:29:45Z TINKERPOP-1676 More optimizations to GraphSON serializers Added some tests where there were previously gaps. commit f32d725a080d39ef7bf93c68d8080939f622cb37 Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-05-25T18:03:29Z TINKERPOP-1676 Cleaned up serialization with detached The DetachedUtil was just there to test out if the performance changes would help and it did it's job nicely, but it was kinda ugly and hung some methods out there in weird way. Cleaned that up with some builder pattern on the detached classes themselves. commit bb7ff8290962e30dd4d78c4bd904eef31a3f2478 Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-05-25T19:25:32Z Merge branch ' TINKERPOP-1676 ' into TINKERPOP-1676 -master Conflicts: docs/src/dev/io/graphson.asciidoc
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user spmallette opened a pull request:

          https://github.com/apache/tinkerpop/pull/613

          TINKERPOP-1676 (tp32) GraphSON Performance

          https://issues.apache.org/jira/browse/TINKERPOP-1676

          Made some pretty good improvements to performance. I was mostly focused on the serialization of complete graph elements so using TinkerGraph as the basis for benchmark seemed like a good way to do that. I subgraphed a portion of the Grateful Dead graph:

          ```text
          gremlin> subgraph = g.E().hasLabel('followedBy').subgraph('subGraph').cap('subGraph').next()
          ==>tinkergraph[vertices:338 edges:7047]
          ```

          I wanted a comparison with Gryo so I got that first:

          ```text
          gremlin> gryo = graph.io(IoCore.gryo()).mapper().create().createMapper()
          ==>org.apache.tinkerpop.shaded.kryo.Kryo@5edf2821
          gremlin> clock

          { ......1> def s = new java.io.ByteArrayOutputStream() ......2> def output = new org.apache.tinkerpop.shaded.kryo.io.Output(s) ......3> gryo.writeObject(output, subgraph) ......4> output.flush() ......5> def input = new org.apache.tinkerpop.shaded.kryo.io.Input(s.toByteArray()) ......6> gryo.readObject(input, TinkerGraph.class) ......7> }

          ==>55.96791679999999
          ```

          then compared head of tp32:

          ```text
          gremlin> graphson = GraphSONMapper.build().version(GraphSONVersion.V2_0).addCustomModule(GraphSONXModuleV2d0.build().create(false)).addRegistry(TinkerIoRegistryV2d0.instance()).create().createMapper()
          ==>org.apache.tinkerpop.shaded.jackson.databind.ObjectMapper@185f7840
          gremlin> clock

          { ......1> def s = new java.io.ByteArrayOutputStream() ......2> graphson.writeValue(s, subgraph) ......3> graphson.readValue(new java.io.ByteArrayInputStream(s.toByteArray()), TinkerGraph.class) ......4> }
          ==>723.1702541799999
          ```

          pretty bad - now GraphSON with changes from this branch:

          ```text
          gremlin> graphson = GraphSONMapper.build().version(GraphSONVersion.V2_0).addCustomModule(GraphSONXModuleV2d0.build().create(false)).addRegistry(TinkerIoRegistryV2d0.instance()).create().createMapper()
          ==>org.apache.tinkerpop.shaded.jackson.databind.ObjectMapper@563a89b5
          gremlin> clock { ......1> def s = new java.io.ByteArrayOutputStream() ......2> graphson.writeValue(s, subgraph) ......3> graphson.readValue(new java.io.ByteArrayInputStream(s.toByteArray()), TinkerGraph.class) ......4> }

          ==>67.0219281
          ```

          Not too far behind gryo now!

          Running full integration tests now and will update this PR when it's done successfully.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/apache/tinkerpop TINKERPOP-1676

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/tinkerpop/pull/613.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #613


          commit 46e6e9767b6818f33b56366ee15d80cec5a908e0
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-05-23T20:03:17Z

          TINKERPOP-1676 Got rid of stream() usage

          Can't believe we still had stream() in here. Will it ever all be gone from these performance sensitive places?! Anyway, removed that and deprecated a constructor on DetachedEdge that was using Pair for no really good reason. No need to create extra Pair objects for that. They just sorta get thrown away after usage.

          commit 251f5b7e34e8ebd9a8bc36802e633be8c91eeb5e
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-05-23T20:57:12Z

          TINKERPOP-1676 Improved speed and memory usage of GraphSON

          This change is specific to TinkerGraph and the serialization of vertices/edges/properties. Removed a "middle layer" of JSON to Object serialization which was coercing JSON to Map first and then converting that to the graph elements. Also made deserializers cacheable.

          commit 02b007366b434918fe1181f99d80689c1c03684b
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-05-24T00:13:46Z

          TINKERPOP-1676 Performance enhancement to graphson serialization

          Focuses on speeding up serialization of graph elements. Prevent use of generic maps and stream graphson data directly into "detached" elements. Not using intermediate maps cut down on memory usage and a bunch of jackson reflection calls (still not sure what they were for and why they were not made to be more efficient). It did mean some ugly changes to "detached" stuff. Will need to maybe refactor some more, but the basic premise seems to be proven.

          commit e5d2d2bc000654fd026521219cf30bac265139a3
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-05-24T17:38:16Z

          TINKERPOP-1676 Removed properties from graphson serialization of Path

          Path should not have properties on elements (if they are present). That is inconsistent with gryo and was unintentially allowed.

          commit 049c979720a84ea9744725fda5f8d2b371ab1751
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-05-25T17:29:45Z

          TINKERPOP-1676 More optimizations to GraphSON serializers

          Added some tests where there were previously gaps.

          commit f32d725a080d39ef7bf93c68d8080939f622cb37
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-05-25T18:03:29Z

          TINKERPOP-1676 Cleaned up serialization with detached

          The DetachedUtil was just there to test out if the performance changes would help and it did it's job nicely, but it was kinda ugly and hung some methods out there in weird way. Cleaned that up with some builder pattern on the detached classes themselves.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user spmallette opened a pull request: https://github.com/apache/tinkerpop/pull/613 TINKERPOP-1676 (tp32) GraphSON Performance https://issues.apache.org/jira/browse/TINKERPOP-1676 Made some pretty good improvements to performance. I was mostly focused on the serialization of complete graph elements so using TinkerGraph as the basis for benchmark seemed like a good way to do that. I subgraphed a portion of the Grateful Dead graph: ```text gremlin> subgraph = g.E().hasLabel('followedBy').subgraph('subGraph').cap('subGraph').next() ==>tinkergraph [vertices:338 edges:7047] ``` I wanted a comparison with Gryo so I got that first: ```text gremlin> gryo = graph.io(IoCore.gryo()).mapper().create().createMapper() ==>org.apache.tinkerpop.shaded.kryo.Kryo@5edf2821 gremlin> clock { ......1> def s = new java.io.ByteArrayOutputStream() ......2> def output = new org.apache.tinkerpop.shaded.kryo.io.Output(s) ......3> gryo.writeObject(output, subgraph) ......4> output.flush() ......5> def input = new org.apache.tinkerpop.shaded.kryo.io.Input(s.toByteArray()) ......6> gryo.readObject(input, TinkerGraph.class) ......7> } ==>55.96791679999999 ``` then compared head of tp32: ```text gremlin> graphson = GraphSONMapper.build().version(GraphSONVersion.V2_0).addCustomModule(GraphSONXModuleV2d0.build().create(false)).addRegistry(TinkerIoRegistryV2d0.instance()).create().createMapper() ==>org.apache.tinkerpop.shaded.jackson.databind.ObjectMapper@185f7840 gremlin> clock { ......1> def s = new java.io.ByteArrayOutputStream() ......2> graphson.writeValue(s, subgraph) ......3> graphson.readValue(new java.io.ByteArrayInputStream(s.toByteArray()), TinkerGraph.class) ......4> } ==>723.1702541799999 ``` pretty bad - now GraphSON with changes from this branch: ```text gremlin> graphson = GraphSONMapper.build().version(GraphSONVersion.V2_0).addCustomModule(GraphSONXModuleV2d0.build().create(false)).addRegistry(TinkerIoRegistryV2d0.instance()).create().createMapper() ==>org.apache.tinkerpop.shaded.jackson.databind.ObjectMapper@563a89b5 gremlin> clock { ......1> def s = new java.io.ByteArrayOutputStream() ......2> graphson.writeValue(s, subgraph) ......3> graphson.readValue(new java.io.ByteArrayInputStream(s.toByteArray()), TinkerGraph.class) ......4> } ==>67.0219281 ``` Not too far behind gryo now! Running full integration tests now and will update this PR when it's done successfully. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/tinkerpop TINKERPOP-1676 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tinkerpop/pull/613.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #613 commit 46e6e9767b6818f33b56366ee15d80cec5a908e0 Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-05-23T20:03:17Z TINKERPOP-1676 Got rid of stream() usage Can't believe we still had stream() in here. Will it ever all be gone from these performance sensitive places?! Anyway, removed that and deprecated a constructor on DetachedEdge that was using Pair for no really good reason. No need to create extra Pair objects for that. They just sorta get thrown away after usage. commit 251f5b7e34e8ebd9a8bc36802e633be8c91eeb5e Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-05-23T20:57:12Z TINKERPOP-1676 Improved speed and memory usage of GraphSON This change is specific to TinkerGraph and the serialization of vertices/edges/properties. Removed a "middle layer" of JSON to Object serialization which was coercing JSON to Map first and then converting that to the graph elements. Also made deserializers cacheable. commit 02b007366b434918fe1181f99d80689c1c03684b Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-05-24T00:13:46Z TINKERPOP-1676 Performance enhancement to graphson serialization Focuses on speeding up serialization of graph elements. Prevent use of generic maps and stream graphson data directly into "detached" elements. Not using intermediate maps cut down on memory usage and a bunch of jackson reflection calls (still not sure what they were for and why they were not made to be more efficient). It did mean some ugly changes to "detached" stuff. Will need to maybe refactor some more, but the basic premise seems to be proven. commit e5d2d2bc000654fd026521219cf30bac265139a3 Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-05-24T17:38:16Z TINKERPOP-1676 Removed properties from graphson serialization of Path Path should not have properties on elements (if they are present). That is inconsistent with gryo and was unintentially allowed. commit 049c979720a84ea9744725fda5f8d2b371ab1751 Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-05-25T17:29:45Z TINKERPOP-1676 More optimizations to GraphSON serializers Added some tests where there were previously gaps. commit f32d725a080d39ef7bf93c68d8080939f622cb37 Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-05-25T18:03:29Z TINKERPOP-1676 Cleaned up serialization with detached The DetachedUtil was just there to test out if the performance changes would help and it did it's job nicely, but it was kinda ugly and hung some methods out there in weird way. Cleaned that up with some builder pattern on the detached classes themselves.
          Hide
          spmallette stephen mallette added a comment -

          Just pushed the change that removes properties from a Path if they were present. Along with some other changes, Path serialization is much faster now and largely on par with Gryo. Note the DISCUSS thread here

          Show
          spmallette stephen mallette added a comment - Just pushed the change that removes properties from a Path if they were present. Along with some other changes, Path serialization is much faster now and largely on par with Gryo. Note the DISCUSS thread here
          Hide
          spmallette stephen mallette added a comment -

          Already pushed this little change to make better use of memory:

          https://github.com/apache/tinkerpop/commit/7ea38751871f6eddd4c4bcfe8ac82300a08f5a7d

          It basically reads vertices/edges from a stream in TinkerGraph serialization (when passing subgraphs back from Gremlin Server) rather than building up a big List of vertices/edges in memory to later iterate through. That change seemed to reduce the number and frequency of GC operations.

          Show
          spmallette stephen mallette added a comment - Already pushed this little change to make better use of memory: https://github.com/apache/tinkerpop/commit/7ea38751871f6eddd4c4bcfe8ac82300a08f5a7d It basically reads vertices/edges from a stream in TinkerGraph serialization (when passing subgraphs back from Gremlin Server) rather than building up a big List of vertices/edges in memory to later iterate through. That change seemed to reduce the number and frequency of GC operations.

            People

            • Assignee:
              spmallette stephen mallette
              Reporter:
              spmallette stephen mallette
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development