Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-3087

Panic in gremlin-go driver responseHandler

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.7.1
    • None
    • go
    • None
    • Ubuntu 20.04, x86_64, AWS EC2 image. Think it's 2c2g, can't recall off the top of my head.

      Compiled _without_ CGO.

    Description

      Ocasionally, I get the following panic in the gremlin-go driver:

       

      10:22:39.895 <successfully added a vertex to janusgraph>
      10:22:44.487 panic: runtime error: invalid memory address or nil pointer dereference
      10:22:44.487 [signal SIGSEGV: segmentation violation code=0x1 addr=0x78 pc=0x11fdd16]
      10:22:44.487 
      10:22:44.487 goroutine 275 [running]:
      10:22:44.487 github.com/apache/tinkerpop/gremlin-go/v3/driver.(*gremlinServerWSProtocol).responseHandler(0xc0006be6c0, 0xc0001dfc60, {{0x9, 0xf6, 0x33, 0x7f, 0x19, 0x16, 0x4e, 0xc9, ...}, ...})
      10:22:44.487 /go/pkg/mod/github.com/apache/tinkerpop/gremlin-go/v3@v3.7.1/driver/protocol.go:116 +0x7d6
      10:22:44.487 github.com/apache/tinkerpop/gremlin-go/v3/driver.(*gremlinServerWSProtocol).readLoop(0xc0006be6c0, 0xc0001dfc60, 0xc0001dfca0)
      10:22:44.487 /go/pkg/mod/github.com/apache/tinkerpop/gremlin-go/v3@v3.7.1/driver/protocol.go:82 +0x272
      10:22:44.487 created by github.com/apache/tinkerpop/gremlin-go/v3/driver.newGremlinServerWSProtocol in goroutine 272
      10:22:44.487 /go/pkg/mod/github.com/apache/tinkerpop/gremlin-go/v3@v3.7.1/driver/protocol.go:197 +0x23a

      I can't reliably reproduce, it's never happened locally. It's occurred just after doing some work in the logs above, but it's also happened after a period of doing nothing.

       

      First glance looks like it's a nil being returned from the `resultSets` synchronizedMap, something getting in and removing it, which should only happen when the `channelResultSet` itself is closed? Seems like a race condition to me.

      I think the sensible way to handle this, if it is what I suspect (I've no proof yet), would be to

      • Store a reference to the ResultSet when first loading it in responseHandler
      • Make `channelResultSet` discard modifications after close OR
      • responseHandler needs to aquire the `channelMutex` somehow for the duration of modifications (I think I prefer this, it's more obvious to what's happening)

      I did try and get the project tests to run, but it's been years since I've had to use maven and I can't seem to find a good idiots guide to working with the project from a non java POV - I need to know how to build the docker images the tests require. If someone could help me get started I'd be more than happy to contribute a PR (although given the nature, a reliable test may be hard to write).

      As it stands, I'm probably going to modify my fork and mod replace, see if a fix would actually work in my deployments.

      Attachments

        Activity

          People

            Unassigned Unassigned
            spiral90210 David Bennington
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: