Details
Description
Ocasionally, I get the following panic in the gremlin-go driver:
10:22:39.895 <successfully added a vertex to janusgraph> 10:22:44.487 panic: runtime error: invalid memory address or nil pointer dereference 10:22:44.487 [signal SIGSEGV: segmentation violation code=0x1 addr=0x78 pc=0x11fdd16] 10:22:44.487 10:22:44.487 goroutine 275 [running]: 10:22:44.487 github.com/apache/tinkerpop/gremlin-go/v3/driver.(*gremlinServerWSProtocol).responseHandler(0xc0006be6c0, 0xc0001dfc60, {{0x9, 0xf6, 0x33, 0x7f, 0x19, 0x16, 0x4e, 0xc9, ...}, ...}) 10:22:44.487 /go/pkg/mod/github.com/apache/tinkerpop/gremlin-go/v3@v3.7.1/driver/protocol.go:116 +0x7d6 10:22:44.487 github.com/apache/tinkerpop/gremlin-go/v3/driver.(*gremlinServerWSProtocol).readLoop(0xc0006be6c0, 0xc0001dfc60, 0xc0001dfca0) 10:22:44.487 /go/pkg/mod/github.com/apache/tinkerpop/gremlin-go/v3@v3.7.1/driver/protocol.go:82 +0x272 10:22:44.487 created by github.com/apache/tinkerpop/gremlin-go/v3/driver.newGremlinServerWSProtocol in goroutine 272 10:22:44.487 /go/pkg/mod/github.com/apache/tinkerpop/gremlin-go/v3@v3.7.1/driver/protocol.go:197 +0x23a
I can't reliably reproduce, it's never happened locally. It's occurred just after doing some work in the logs above, but it's also happened after a period of doing nothing.
First glance looks like it's a nil being returned from the `resultSets` synchronizedMap, something getting in and removing it, which should only happen when the `channelResultSet` itself is closed? Seems like a race condition to me.
I think the sensible way to handle this, if it is what I suspect (I've no proof yet), would be to
- Store a reference to the ResultSet when first loading it in responseHandler
- Make `channelResultSet` discard modifications after close OR
- responseHandler needs to aquire the `channelMutex` somehow for the duration of modifications (I think I prefer this, it's more obvious to what's happening)
I did try and get the project tests to run, but it's been years since I've had to use maven and I can't seem to find a good idiots guide to working with the project from a non java POV - I need to know how to build the docker images the tests require. If someone could help me get started I'd be more than happy to contribute a PR (although given the nature, a reliable test may be hard to write).
As it stands, I'm probably going to modify my fork and mod replace, see if a fix would actually work in my deployments.