Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-1335

OLAP queries potentially fail for certain match()/select() query patterns

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.2.1
    • Fix Version/s: None
    • Component/s: hadoop, process
    • Labels:
      None

      Description

      There are certain queries that return wrong results when executed via SparkGraphComputer. After testing a few queries I would say that the problematic query pattern is a match() / select() combo.

      For example (Grateful Dead graph):

      gremlin> g.V().hasLabel("song").match(
                   __.as("a").values("name").as("name"),
                   __.as("a").values("performances").as("performances")
                 ).select("name","performances").count()
      ==>0
      

      If count() is replaced by program(), the whole thing is going to throw exceptions. However, if we select a instead of name and performances, we get correct result. Likewise, if we remove the select() or just rewrite the match() part, everything works as expected. The simplest query to reproduce the erroneous behavior is this one:

      g.V().match(__.as("a").values("name").as("name")).select("name").count()
      

      The tests were done using a real Spark Server. I didn't try to use Spark in local mode or Giraph. I did try TinkerGraphComputer, which worked fine.

      Here's an actual stacktrace that shows were to find the root of all evil:

      ERROR 2016-06-09 21:24:25,988 Logging.scala:95 - org.apache.spark.executor.Executor: Exception in task 0.2 in stage 119.0 (TID 307)
      java.lang.IllegalStateException: The host of the object is unknown: {a=v[{~label=Comment, member_id=2034, community_id=1676454656}], content=ok, length=2}:java.util.LinkedHashMap
          at org.apache.tinkerpop.gremlin.process.computer.traversal.WorkerExecutor.getHostingVertex(WorkerExecutor.java:242) ~[gremlin-core-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1]
          at org.apache.tinkerpop.gremlin.process.computer.traversal.WorkerExecutor.lambda$drainStep$262(WorkerExecutor.java:220) ~[gremlin-core-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1]
          at org.apache.tinkerpop.gremlin.process.computer.traversal.WorkerExecutor$$Lambda$113/1202183304.accept(Unknown Source) ~[na:na]
          at java.util.Iterator.forEachRemaining(Iterator.java:116) ~[na:1.8.0_40]
          at org.apache.tinkerpop.gremlin.process.computer.traversal.WorkerExecutor.drainStep(WorkerExecutor.java:215) ~[gremlin-core-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1]
          at org.apache.tinkerpop.gremlin.process.computer.traversal.WorkerExecutor.execute(WorkerExecutor.java:146) ~[gremlin-core-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1]
          at org.apache.tinkerpop.gremlin.process.computer.traversal.TraversalVertexProgram.execute(TraversalVertexProgram.java:285) ~[gremlin-core-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1]
          at org.apache.tinkerpop.gremlin.spark.process.computer.SparkExecutor.lambda$null$9(SparkExecutor.java:111) ~[spark-gremlin-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1]
          at org.apache.tinkerpop.gremlin.spark.process.computer.SparkExecutor$$Lambda$92/910806192.apply(Unknown Source) ~[na:na]
          at org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils$3.next(IteratorUtils.java:247) ~[gremlin-core-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1]
          at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) ~[scala-library-2.10.6.jar:na]
          at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) ~[scala-library-2.10.6.jar:na]
          at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) ~[scala-library-2.10.6.jar:na]
          at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:189) ~[spark-core_2.10-1.6.1.2.jar:1.6.1.2]
          at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64) ~[spark-core_2.10-1.6.1.2.jar:1.6.1.2]
          at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) ~[spark-core_2.10-1.6.1.2.jar:1.6.1.2]
          at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) ~[spark-core_2.10-1.6.1.2.jar:1.6.1.2]
          at org.apache.spark.scheduler.Task.run(Task.scala:89) ~[spark-core_2.10-1.6.1.2.jar:1.6.1.2]
          at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) ~[spark-core_2.10-1.6.1.2.jar:1.6.1.2]
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_40]
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_40]
          at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
      

        Attachments

          Activity

            People

            • Assignee:
              okram Marko A. Rodriguez
              Reporter:
              dkuppitz Daniel Kuppitz
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: