Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-2877

CarbonDataWriterException when loading data to carbon table with large number of rows/columns from Spark-Submit

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.4.1
    • None
    • data-load
    • None
    • Spark 2.1

    Description

      Steps :

      from Spark-Submit. User creates a table with large number of columns(around 100) and tries to load around 3 lakh records to the table.

      Spark-submit command - spark-submit --master yarn --num-executors 3 --executor-memory 75g --driver-memory 10g --executor-cores 12 --class

      Actual Issue : Data loading fails with CarbonDataWriterException.

      Executor yarn UI log-

      org.apache.spark.util.TaskCompletionListenerException: org.apache.carbondata.core.datastore.exception.CarbonDataWriterException

      Previous exception in task: Error while initializing data handler :
      org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:141)
      org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:51)
      org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD$$anon$1.<init>(NewCarbonDataLoadRDD.scala:221)
      org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD.internalCompute(NewCarbonDataLoadRDD.scala:197)
      org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78)
      org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
      org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
      org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
      org.apache.spark.scheduler.Task.run(Task.scala:99)
      org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      java.lang.Thread.run(Thread.java:748)
      at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)
      at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
      at org.apache.spark.scheduler.Task.run(Task.scala:109)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)

       

      Expected : The dataloading should be successful from Spark-submit similar to that in Beeline.

      Attachments

        Activity

          People

            namanrastogi Naman Rastogi
            chetdb Chetan Bhat
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: