Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-1427

GraphSON 3.0 needs collection types and consistent number typing.

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: 3.2.1
    • Fix Version/s: 3.3.0
    • Component/s: io
    • Labels:
      None

      Description

      Before 3.3.0, we need to get the story around collections straight. Currently we are relying on JSON collections to represent our collections, but this isn't always sound – e.g. JSON maps can only have string keys.

      Thus, we need:

      g:Map
      g:List
      g:Set
      

      Note that all of these would just be JSON lists:

      {@type:"g:Map", @value:[key1,value1,key2,value2,key3,value3,...]}
      {@type:"g:List", @value:[value1, value2, value3,...]}
      {@type:"g:Set", @value:[value1, value2, value3,...]}
      

      Next, these data structures are exactly what play into aggregateTo in sideEffects for RemoteConnection. Thus, we should use these types and, as well, get rid of none as the aggregate would be a real type like g:Int32.

      Also, I think we should abandon this hybrid physical machine naming convention and programming language type convention.

      g:Int32 -> g:Integer
      g:Int64 -> g:Long
      g:Double -> g:Double (no change)
      g:Float -> g:Float (no change)
      

      If we want to be consistent either do the above or do the below, though I think the above is cleaner.

      g:Int32 -> g:Int32
      g:Int64 -> g:Int64
      g:Float -> g:Float32
      g:Double -> g:Float64
      

      Again, using programming lexicon vs. physical machine lexicon is best and thus, just gut "int32" and "int64."

        Issue Links

          Activity

          Hide
          rdale Robert Dale added a comment -

          But therein lies the problem. Unlike float, double where we can point to an IEEE spec, integer and long are not defined consistently in programming languages unless you mean just Java. They are nothing more than weak labels for physical storage size. So either storage size is concisely defined or it should be ignored altogether by calling it a number. After all, the distinction between a short, integer, and long is really just a memory optimization. This was part of my original argument in favor of using JSON native number type.

          Let's work through an example. Python is a great example of a weakly typed system that will run into this issue. Python2 has only two data types for integers, but only one is applicable here for this exercise. Its 'int' can be 32 bits or up to 64 bits depending on platform. Python3 now only has one data type for integers which can be arbitrarily long. How does python differentiate between a short, int, long? Will there be logic in the GLV to handle number storage size based on value, which seems dangerous, will it be schema-aware, or will the user have to handle this? If the user is responsible, how is it coerced so that the GLV can interpret it? What would this actually look like?

          I'm still of the opinion that for integers a number is a number is a number. Anything less would be uncivilized.

          Show
          rdale Robert Dale added a comment - But therein lies the problem. Unlike float, double where we can point to an IEEE spec, integer and long are not defined consistently in programming languages unless you mean just Java. They are nothing more than weak labels for physical storage size. So either storage size is concisely defined or it should be ignored altogether by calling it a number. After all, the distinction between a short, integer, and long is really just a memory optimization. This was part of my original argument in favor of using JSON native number type. Let's work through an example. Python is a great example of a weakly typed system that will run into this issue. Python2 has only two data types for integers, but only one is applicable here for this exercise. Its 'int' can be 32 bits or up to 64 bits depending on platform. Python3 now only has one data type for integers which can be arbitrarily long. How does python differentiate between a short, int, long? Will there be logic in the GLV to handle number storage size based on value, which seems dangerous, will it be schema-aware, or will the user have to handle this? If the user is responsible, how is it coerced so that the GLV can interpret it? What would this actually look like? I'm still of the opinion that for integers a number is a number is a number. Anything less would be uncivilized.
          Hide
          dmill Dylan Millikin added a comment - - edited

          The problem though is that typing will be done on the underlying graph layers so we can't just handle "numbers" or we will run into conflicts (2 <> 2L). Which is why we went with the current int32/int64.

          Also I agree with changing things around for maps. We will need to keep Tree serialization in mind though. As it would produce structurally different results depending on whether or not typing is turned on.
          Tree is already a JSON Map<Object, Object> serialization in GRAPHSON1

          Show
          dmill Dylan Millikin added a comment - - edited The problem though is that typing will be done on the underlying graph layers so we can't just handle "numbers" or we will run into conflicts (2 <> 2L). Which is why we went with the current int32 / int64 . Also I agree with changing things around for maps. We will need to keep Tree serialization in mind though. As it would produce structurally different results depending on whether or not typing is turned on. Tree is already a JSON Map<Object, Object> serialization in GRAPHSON1
          Hide
          rdale Robert Dale added a comment -

          Technically, that's not correct. 2 == 2L. I think you mean Integer(2) != Long(2). But we all know you should never use operators on objects, so what you really means is !Integer(2).equals(Long(2)). However, numbers should never be compared using equals() at least not without being promoted to like types. The JVM automatically promotes primitives to like types but unfortunately not for objects. This is left as an exercise to the user. If numbers are not converted to primitives or promoted to like types, then your comparison is fundamentally broken.

          It seems that the only underlying graph that's holding this back is TinkerGraph where Modern and TheCrew don't implement proper IdManagers for whatever reason. Are there any real-world implementations that don't sanely promote and compare numbers? I'm not aware of any.

          Consider the user perspective, what makes sense? What user expects 2 != 2? What other languages exhibit this behavior? Is there any precedent at all that supports this behavior outside of this java idiosyncrasy?

          Maybe storage size should be extended to strings. There should be a String32 and String64. That way it would be consistent with numbers such that "gremlin" != "gremlin". Yes, I'm being facetious but trying to make a point on how ludicrous this is.

          All that aside, assuming that gremlin does not strive to be language-agnostic and continue shove java idiosyncrasies down everyone's throat, the question still stands on how will this actually be supported on weakly-typed clients? Will you create new types TinkerInt, TinkerLong, etc? g.V(TinkerInt(2))? Or will the driver need to reference a schema to build the correct types? or is there some other option?

          Show
          rdale Robert Dale added a comment - Technically, that's not correct. 2 == 2L. I think you mean Integer(2) != Long(2). But we all know you should never use operators on objects, so what you really means is !Integer(2).equals(Long(2)). However, numbers should never be compared using equals() at least not without being promoted to like types. The JVM automatically promotes primitives to like types but unfortunately not for objects. This is left as an exercise to the user. If numbers are not converted to primitives or promoted to like types, then your comparison is fundamentally broken. It seems that the only underlying graph that's holding this back is TinkerGraph where Modern and TheCrew don't implement proper IdManagers for whatever reason. Are there any real-world implementations that don't sanely promote and compare numbers? I'm not aware of any. Consider the user perspective, what makes sense? What user expects 2 != 2? What other languages exhibit this behavior? Is there any precedent at all that supports this behavior outside of this java idiosyncrasy? Maybe storage size should be extended to strings. There should be a String32 and String64. That way it would be consistent with numbers such that "gremlin" != "gremlin". Yes, I'm being facetious but trying to make a point on how ludicrous this is. All that aside, assuming that gremlin does not strive to be language-agnostic and continue shove java idiosyncrasies down everyone's throat, the question still stands on how will this actually be supported on weakly-typed clients? Will you create new types TinkerInt, TinkerLong, etc? g.V(TinkerInt(2))? Or will the driver need to reference a schema to build the correct types? or is there some other option?
          Hide
          dmill Dylan Millikin added a comment -

          I hear you, trust me. I'm just pointing out some potential conflicts. I'm not sure if this could be another but what of the Property typing for some DBs (like Titan). If you have a schema expecting Long are we going to have conflicts?

          Show
          dmill Dylan Millikin added a comment - I hear you, trust me. I'm just pointing out some potential conflicts. I'm not sure if this could be another but what of the Property typing for some DBs (like Titan). If you have a schema expecting Long are we going to have conflicts?
          Hide
          spmallette stephen mallette added a comment -

          I removed the "breaking" label as this change will go in for a new version of GraphSON so previous versions will be unaffected.

          Show
          spmallette stephen mallette added a comment - I removed the "breaking" label as this change will go in for a new version of GraphSON so previous versions will be unaffected.
          Hide
          spmallette stephen mallette added a comment -

          Not sure if anyone cares, but I may break off the aspect of this ticket that deals with numbering. It's a bit outside of the scope of what I'd like evaluated in a PR for the "collection stuff". Plus the number issue seems like it it is a problem generally held across TinkerPop and not something specific to GraphSON.

          I did think TINKERPOP-1523 offered an interesting position on that number stuff fwiw. Robert Dale not sure if that appeals to you or not.

          Show
          spmallette stephen mallette added a comment - Not sure if anyone cares, but I may break off the aspect of this ticket that deals with numbering. It's a bit outside of the scope of what I'd like evaluated in a PR for the "collection stuff". Plus the number issue seems like it it is a problem generally held across TinkerPop and not something specific to GraphSON. I did think TINKERPOP-1523 offered an interesting position on that number stuff fwiw. Robert Dale not sure if that appeals to you or not.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user spmallette opened a pull request:

          https://github.com/apache/tinkerpop/pull/672

          TINKERPOP-1427 GraphSON 3.0

          https://issues.apache.org/jira/browse/TINKERPOP-1427
          https://issues.apache.org/jira/browse/TINKERPOP-1574
          https://issues.apache.org/jira/browse/TINKERPOP-1414

          GraphSON 3.0 is basically 2.0 with the additional types of `g;Map`, `g:List` and `g:Set`. Includes data file renaming for consistency and versioning. Packaged data files that are part of zips are now only include 3.0 instances. Defaulted to GraphSON and Gryo 3.0 in all places. IO docs have been updated.

          Upgrade docs needs some work but I'll do that later.

          All tests pass with `docker/build.sh -t -n -i`

          VOTE +1

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/apache/tinkerpop TINKERPOP-1427

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/tinkerpop/pull/672.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #672


          commit 911d777c113f72324782dd9307a764099f0ccc22
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-06-30T16:34:52Z

          TINKERPOP-1427 Make GraphSON 3.0 serializers equivalent to 2.0

          There were some internal changes to GraphSON 2.0 deserializers that improved performance. Migrated all that to 3.0.

          commit c7a81531b1ed04a5ae6357544038f9cc77ec9cc3
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-06-30T18:00:13Z

          TINKERPOP-1427 Should use GraphSON 3.0 in the serializer provider

          commit 548baab384d8c667f953503171fbf2e3edf83f5f
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-07-01T18:16:57Z

          TINKERPOP-1427 Added g:Map for GraphSON 3.0

          commit 65598953a0fb4f7c3d72e4f1c6d792d370fd780b
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-07-01T19:08:53Z

          TINKERPOP-1427 Migrate GraphSON v2 performance enhancement to v3

          This was for TinkerGraph whole graph serialization.

          commit 9949c38da14a77ddf99e017f0f7c8c91d127c020
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-07-02T21:51:18Z

          TINKERPOP-1427 Added g:Set and g:List for GraphSON 3.0

          commit caec00199e0ab595ea3e36656f1c311f90547c43
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2017-07-06T21:56:46Z

          99 percent complete GraphSON 3.0 working with Gremlin-Python. Extended test suite to support both GraphSON 2.0 and GraphSON 3.0 testing. There is one last requirement to do on the code – a version flag on GraphSONMessageSerializer. However, for some reason application/gremlin-v3.0 mime type stuff doesn't work for the Python->GremlinServer tests... Committing what I have so far as I think the issue might be an easy .yaml file fix or something. Dunno.

          commit c06d9feee4b16c39e41a2067692d8d487f25cb62
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-07-07T15:35:34Z

          TINKERPOP-1427 Fixed py server config

          Should have used maching TinkerGraph IORegistry for the version.

          commit ff89b2a8fcdd1042b5cc2f172ca69015c83d3cd5
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-07-07T17:52:15Z

          TINKERPOP-1427 Fixed problems with v2 vs v3 in gremlin-python

          There were several problems. In protocol.py there was an expectation as the to the format of the ResponseMessage that changed between v2 and v3. I added a bit of a sketchy hack to deal with that detects v2/v3 and then parses accordingly. That should be nicer. There were some issues with tests as well that bound assertions to v2 so that when the default serializer swapped from v2 to v3 those assertions started failing. I forced those specific tests to v2 to get them to pass. Ultimately, we need to more generally test v2 and v3, but at least gremlin-python is defaulted to v3 at this point and all tests are passing.

          commit 3d793ac50911cece1abc6a72b0f6661717cb7d4f
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2017-07-07T20:52:30Z

          added GraphSONReader tests for set/list/map to test_graphsonV3d0.py. Already tested in the Jython infrastruture, but why not have it also localized in the pure Python tests.

          commit 3dc0b03f0d17ab7bbcbf013c82f32e59d27f29c6
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-07-10T16:50:28Z

          TINKERPOP-1427 Coerced Map.Entry to a Map with single entry

          commit 228e578f89b82c633424ff2e59a068c58adc6971
          Author: Marko A. Rodriguez <okrammarko@gmail.com>
          Date: 2017-07-10T19:42:44Z

          GraphSONRecordReader/Writer can now be configured to use either GraphSON V2 or V3. Added a test case to verfify proper behavior of parser from GraphSON V3 on HDFS.

          commit 4df07fd795b6a4c80b077dcc05bc313dfaca7019
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-07-10T21:25:07Z

          TINKERPOP-1427 Fixed a bad variable naming after a rebase

          commit 06553498adb16a31e35d8579ad1e7cc207c39148
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-07-11T19:02:57Z

          TINKERPOP-1427 Dropped graphson configs prior to 3.0

          commit 5b5f850f15e00bd9ec81458c61f428b4abbb1588
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-07-12T11:12:25Z

          TINKERPOP-1427 Regenerated/renamed all data files

          Added versions to file names. Dropped packaged data files that were not 3.0.

          commit 8d973c396a8d9dde6927f2f2c2f4bf0fdab08a16
          Author: Stephen Mallette <spmva@genoprime.com>
          Date: 2017-07-12T13:57:24Z

          TINKERPOP-1427 Specified gryo versions in tests


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user spmallette opened a pull request: https://github.com/apache/tinkerpop/pull/672 TINKERPOP-1427 GraphSON 3.0 https://issues.apache.org/jira/browse/TINKERPOP-1427 https://issues.apache.org/jira/browse/TINKERPOP-1574 https://issues.apache.org/jira/browse/TINKERPOP-1414 GraphSON 3.0 is basically 2.0 with the additional types of `g;Map`, `g:List` and `g:Set`. Includes data file renaming for consistency and versioning. Packaged data files that are part of zips are now only include 3.0 instances. Defaulted to GraphSON and Gryo 3.0 in all places. IO docs have been updated. Upgrade docs needs some work but I'll do that later. All tests pass with `docker/build.sh -t -n -i` VOTE +1 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/tinkerpop TINKERPOP-1427 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tinkerpop/pull/672.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #672 commit 911d777c113f72324782dd9307a764099f0ccc22 Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-06-30T16:34:52Z TINKERPOP-1427 Make GraphSON 3.0 serializers equivalent to 2.0 There were some internal changes to GraphSON 2.0 deserializers that improved performance. Migrated all that to 3.0. commit c7a81531b1ed04a5ae6357544038f9cc77ec9cc3 Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-06-30T18:00:13Z TINKERPOP-1427 Should use GraphSON 3.0 in the serializer provider commit 548baab384d8c667f953503171fbf2e3edf83f5f Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-07-01T18:16:57Z TINKERPOP-1427 Added g:Map for GraphSON 3.0 commit 65598953a0fb4f7c3d72e4f1c6d792d370fd780b Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-07-01T19:08:53Z TINKERPOP-1427 Migrate GraphSON v2 performance enhancement to v3 This was for TinkerGraph whole graph serialization. commit 9949c38da14a77ddf99e017f0f7c8c91d127c020 Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-07-02T21:51:18Z TINKERPOP-1427 Added g:Set and g:List for GraphSON 3.0 commit caec00199e0ab595ea3e36656f1c311f90547c43 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2017-07-06T21:56:46Z 99 percent complete GraphSON 3.0 working with Gremlin-Python. Extended test suite to support both GraphSON 2.0 and GraphSON 3.0 testing. There is one last requirement to do on the code – a version flag on GraphSONMessageSerializer. However, for some reason application/gremlin-v3.0 mime type stuff doesn't work for the Python->GremlinServer tests... Committing what I have so far as I think the issue might be an easy .yaml file fix or something. Dunno. commit c06d9feee4b16c39e41a2067692d8d487f25cb62 Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-07-07T15:35:34Z TINKERPOP-1427 Fixed py server config Should have used maching TinkerGraph IORegistry for the version. commit ff89b2a8fcdd1042b5cc2f172ca69015c83d3cd5 Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-07-07T17:52:15Z TINKERPOP-1427 Fixed problems with v2 vs v3 in gremlin-python There were several problems. In protocol.py there was an expectation as the to the format of the ResponseMessage that changed between v2 and v3. I added a bit of a sketchy hack to deal with that detects v2/v3 and then parses accordingly. That should be nicer. There were some issues with tests as well that bound assertions to v2 so that when the default serializer swapped from v2 to v3 those assertions started failing. I forced those specific tests to v2 to get them to pass. Ultimately, we need to more generally test v2 and v3, but at least gremlin-python is defaulted to v3 at this point and all tests are passing. commit 3d793ac50911cece1abc6a72b0f6661717cb7d4f Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2017-07-07T20:52:30Z added GraphSONReader tests for set/list/map to test_graphsonV3d0.py. Already tested in the Jython infrastruture, but why not have it also localized in the pure Python tests. commit 3dc0b03f0d17ab7bbcbf013c82f32e59d27f29c6 Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-07-10T16:50:28Z TINKERPOP-1427 Coerced Map.Entry to a Map with single entry commit 228e578f89b82c633424ff2e59a068c58adc6971 Author: Marko A. Rodriguez <okrammarko@gmail.com> Date: 2017-07-10T19:42:44Z GraphSONRecordReader/Writer can now be configured to use either GraphSON V2 or V3. Added a test case to verfify proper behavior of parser from GraphSON V3 on HDFS. commit 4df07fd795b6a4c80b077dcc05bc313dfaca7019 Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-07-10T21:25:07Z TINKERPOP-1427 Fixed a bad variable naming after a rebase commit 06553498adb16a31e35d8579ad1e7cc207c39148 Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-07-11T19:02:57Z TINKERPOP-1427 Dropped graphson configs prior to 3.0 commit 5b5f850f15e00bd9ec81458c61f428b4abbb1588 Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-07-12T11:12:25Z TINKERPOP-1427 Regenerated/renamed all data files Added versions to file names. Dropped packaged data files that were not 3.0. commit 8d973c396a8d9dde6927f2f2c2f4bf0fdab08a16 Author: Stephen Mallette <spmva@genoprime.com> Date: 2017-07-12T13:57:24Z TINKERPOP-1427 Specified gryo versions in tests
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user okram commented on the issue:

          https://github.com/apache/tinkerpop/pull/672

          VOTE +1

          Show
          githubbot ASF GitHub Bot added a comment - Github user okram commented on the issue: https://github.com/apache/tinkerpop/pull/672 VOTE +1
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dkuppitz commented on the issue:

          https://github.com/apache/tinkerpop/pull/672

          VOTE: +1

          Show
          githubbot ASF GitHub Bot added a comment - Github user dkuppitz commented on the issue: https://github.com/apache/tinkerpop/pull/672 VOTE: +1
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/tinkerpop/pull/672

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tinkerpop/pull/672

            People

            • Assignee:
              spmallette stephen mallette
              Reporter:
              okram Marko A. Rodriguez
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development