Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-1876

I cannot use the bulkload with spark

    XMLWordPrintableJSON

Details

    • Test
    • Status: Closed
    • Major
    • Resolution: Invalid
    • 3.2.6
    • None
    • plugin
    • Important

    Description

      here is the data

      {"timestamp":"1510335280","name":"sendv54sxu8f12g.ihance.net","type":"a","value":"52.52.81.55"}

      ##

      {"timestamp":"1510338448","name":"*.2925.com.dycdn.com","type":"a","value":"121.201.116.57"}

      ##

      {"timestamp":"1510308398","name":"*.2bask.com","type":"a","value":"176.31.246.156"}

      ##

      {"timestamp":"1510350705","name":"*.5thlegdata.com","type":"a","value":"199.34.228.100"}

      ##

      {"timestamp":"1510350937","name":"*.819.cn","type":"a","value":"118.190.84.164"}

      ##

      {"timestamp":"1510301149","name":"*.acart.iii.com","type":"a","value":"66.171.203.156"}

      ##

      {"timestamp":"1510337980","name":"*.aineistot.lamk.fi","type":"a","value":"193.166.79.79"}

      ##

      {"timestamp":"1510344687","name":"*.amagervvs.dk","type":"a","value":"185.17.52.58"}

      ##

      {"timestamp":"1510350321","name":"*.app-devel.services.actx.com","type":"a","value":"34.209.35.25"}

      ##

      {"timestamp":"1510335280","name":"sendv54sxu8f12g.ihance.net","type":"a","value":"52.52.81.55"} {"timestamp":"1510338448","name":"*.2925.com.dycdn.com","type":"a","value":"121.201.116.57"} {"timestamp":"1510308398","name":"*.2bask.com","type":"a","value":"176.31.246.156"} {"timestamp":"1510350705","name":"*.5thlegdata.com","type":"a","value":"199.34.228.100"} {"timestamp":"1510350937","name":"*.819.cn","type":"a","value":"118.190.84.164"} {"timestamp":"1510301149","name":"*.acart.iii.com","type":"a","value":"66.171.203.156"} {"timestamp":"1510337980","name":"*.aineistot.lamk.fi","type":"a","value":"193.166.79.79"} {"timestamp":"1510344687","name":"*.amagervvs.dk","type":"a","value":"185.17.52.58"} {"timestamp":"1510350321","name":"*.app-devel.services.actx.com","type":"a","value":"34.209.35.25"}

      the bulk could load vertices successfully ,but when i make edges something wrong happened the groovy script is

      def parse(line, factory) {
              if (line.toString().contains("##"))

      {             println "model1"             String noquotes=line.replace("\"","").replace("\{","").replace("}

      ","")
                  def (timestamp,host,type,ip)=noquotes.split(",")
                  String hostname=host.split(":")[1]
                  String ipdetail=ip.split(":")[1]
                  String time=timestamp.split(":")[1]
                  //def label = parts[1] != "" ? "person" : "address"
                  def v1 = factory.vertex(hostname, "host")
                  def v2 = factory.vertex(ipdetail)
                  def edge=factory.edge(v1, v2, "pointto")
                  edge.properties("timestamp",time)

                  return v1
              }else

      {             println "model2"             String noquotes=line.replace("\"","").replace("\{","").replace("}

      ","")
                  def (timestamp,host,type,ip)=noquotes.split(",")
                  String hostname=host.split(":")[1]
                  String ipdetail=ip.split(":")[1]
                  String time=timestamp.split(":")[1]
                  //def label = parts[1] != "" ? "person" : "address"
                  def v1 = factory.vertex(ipdetail, "ip")
                  def v2 = factory.vertex(hostname)
                  def edge=factory.edge(v2, v1, "pointto")
                  edge.properties("timestamp",time)

                  return v1
              }

          }

       

      error stack is

      Opened Graph instance: standardjanusgraph[hbase:[10.9.128.12]]
      java.util.NoSuchElementException
          at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:204)
          at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoader.getVertexById(BulkLoader.java:118)
          at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.lambda$executeInternal$4(BulkLoaderVertexProgram.java:251)
          at java.util.Iterator.forEachRemaining(Iterator.java:116)
          at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.executeInternal(BulkLoaderVertexProgram.java:249)
          at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.execute(BulkLoaderVertexProgram.java:197)
          at org.apache.tinkerpop.gremlin.spark.process.computer.SparkExecutor.lambda$null$5(SparkExecutor.java:118)
          at org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils$3.next(IteratorUtils.java:247)
          at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
          at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
          at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
          at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:189)
          at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
          at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
          at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
          at org.apache.spark.scheduler.Task.run(Task.scala:89)
          at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)

       

      this is the gremlin script

      graph = GraphFactory.open("/user/janusgraph-0.2.0-hadoop2/conf/hadoop-graph/hadoop-script.properties")
      graph.configuration().setProperty("gremlin.hadoop.scriptInputFormat.script", "host_ip2.groovy")     
      graph.configuration().setInputLocation("host_ip.json")
      blvp = BulkLoaderVertexProgram.build().writeGraph("/tmp/10.properties").create(graph)
      graph.compute(SparkGraphComputer).workers(1).configure("fs.defaultFS", "hdfs://am4:8020").program(blvp).submit().get()

       

      is there any one can help me?

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            jx ping
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: