Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-1335

OLAP queries potentially fail for certain match()/select() query patterns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.2.1
    • None
    • hadoop, process
    • None

    Description

      There are certain queries that return wrong results when executed via SparkGraphComputer. After testing a few queries I would say that the problematic query pattern is a match() / select() combo.

      For example (Grateful Dead graph):

      gremlin> g.V().hasLabel("song").match(
                   __.as("a").values("name").as("name"),
                   __.as("a").values("performances").as("performances")
                 ).select("name","performances").count()
      ==>0
      

      If count() is replaced by program(), the whole thing is going to throw exceptions. However, if we select a instead of name and performances, we get correct result. Likewise, if we remove the select() or just rewrite the match() part, everything works as expected. The simplest query to reproduce the erroneous behavior is this one:

      g.V().match(__.as("a").values("name").as("name")).select("name").count()
      

      The tests were done using a real Spark Server. I didn't try to use Spark in local mode or Giraph. I did try TinkerGraphComputer, which worked fine.

      Here's an actual stacktrace that shows were to find the root of all evil:

      ERROR 2016-06-09 21:24:25,988 Logging.scala:95 - org.apache.spark.executor.Executor: Exception in task 0.2 in stage 119.0 (TID 307)
      java.lang.IllegalStateException: The host of the object is unknown: {a=v[{~label=Comment, member_id=2034, community_id=1676454656}], content=ok, length=2}:java.util.LinkedHashMap
          at org.apache.tinkerpop.gremlin.process.computer.traversal.WorkerExecutor.getHostingVertex(WorkerExecutor.java:242) ~[gremlin-core-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1]
          at org.apache.tinkerpop.gremlin.process.computer.traversal.WorkerExecutor.lambda$drainStep$262(WorkerExecutor.java:220) ~[gremlin-core-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1]
          at org.apache.tinkerpop.gremlin.process.computer.traversal.WorkerExecutor$$Lambda$113/1202183304.accept(Unknown Source) ~[na:na]
          at java.util.Iterator.forEachRemaining(Iterator.java:116) ~[na:1.8.0_40]
          at org.apache.tinkerpop.gremlin.process.computer.traversal.WorkerExecutor.drainStep(WorkerExecutor.java:215) ~[gremlin-core-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1]
          at org.apache.tinkerpop.gremlin.process.computer.traversal.WorkerExecutor.execute(WorkerExecutor.java:146) ~[gremlin-core-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1]
          at org.apache.tinkerpop.gremlin.process.computer.traversal.TraversalVertexProgram.execute(TraversalVertexProgram.java:285) ~[gremlin-core-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1]
          at org.apache.tinkerpop.gremlin.spark.process.computer.SparkExecutor.lambda$null$9(SparkExecutor.java:111) ~[spark-gremlin-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1]
          at org.apache.tinkerpop.gremlin.spark.process.computer.SparkExecutor$$Lambda$92/910806192.apply(Unknown Source) ~[na:na]
          at org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils$3.next(IteratorUtils.java:247) ~[gremlin-core-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1]
          at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) ~[scala-library-2.10.6.jar:na]
          at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) ~[scala-library-2.10.6.jar:na]
          at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) ~[scala-library-2.10.6.jar:na]
          at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:189) ~[spark-core_2.10-1.6.1.2.jar:1.6.1.2]
          at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64) ~[spark-core_2.10-1.6.1.2.jar:1.6.1.2]
          at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) ~[spark-core_2.10-1.6.1.2.jar:1.6.1.2]
          at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) ~[spark-core_2.10-1.6.1.2.jar:1.6.1.2]
          at org.apache.spark.scheduler.Task.run(Task.scala:89) ~[spark-core_2.10-1.6.1.2.jar:1.6.1.2]
          at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) ~[spark-core_2.10-1.6.1.2.jar:1.6.1.2]
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_40]
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_40]
          at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
      

      Attachments

        Activity

          People

            okram Marko A. Rodriguez
            dkuppitz Daniel Kuppitz
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: