Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-8165

Spark Dataset Write intermittent "Failed to map key to node" error

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.4
    • Fix Version/s: None
    • Component/s: jdbc, spark
    • Labels:
      None
    • Environment:

      Spark 2.1.0

      Java 1.8.0_152

      Ignite-core-2.4.0.jar

      ignite-spark_2.10-2.4.0.jar

      Scala 2.11.8

       

       

      Description

      Inserts partially fail when issuing a Dataset<Row>  write() operation.  Rerunning write operation causes different sets of rows fail to insert.  Not all of the rows in dsCity.show() are inserted into Ignite.  All random missing rows encountered "Failed to map key to node" exception.

       

      SparkSession spark = SparkSession
      .builder()
      .appName("IgniteSQLDataSource example")
      .master("local[4]")//run local PC using Winutils
      .config("spark.local.dir","/tmp")
      .getOrCreate();

       ... create about 10 {(int) ID, (string) NAME} tuples and add them to the dsCity dataset ...

      Dataset<Row> dsCity = spark.createDataset(...).toDF("ID","NAME");

      dsCity.show(1000);

      String tblName = "CITY";
      String jdbcURL = "jdbc:ignite:thin://127.0.0.1/";

       

      dsCity.write()
      .format("jdbc")
      .option("primary_key_fields", "ID")
      .option("url", jdbcURL)
      .option("driver", "org.apache.ignite.IgniteJdbcThinDriver")
      .option("batchsize", 1000)
      .option("dbtable", tblName)
      .mode(SaveMode.Append)
      .save();

       

      18/04/06 09:33:23 ERROR Executor: Exception in task 3.0 in stage 2.0 (TID 5)
      java.sql.BatchUpdateException: Failed to map key to node.
      at org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.executeBatch(JdbcThinStatement.java:435)
      at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:597)
      at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670)
      at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670)
      at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
      at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
      at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
      at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
      at org.apache.spark.scheduler.Task.run(Task.scala:99)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              p2pxd mark pettovello
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: