Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-1790

(Carbon1.3.0 - Streaming) Data load in Stream Segment fails if batch load is performed in between the streaming

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.3.0
    • None
    • data-query
    • 3 node ant cluster

    Description

      Steps :
      1. Create a streaming table and do a batch load
      2. Set up the Streaming , so that it does streaming in chunk of 1000 records 20 times
      3. Do another batch load on the table
      4. Do one more time streaming
      ---------------------------------------------------------------------------------------------

      Segment Id Status Load Start Time Load End Time File Format Merged To

      ---------------------------------------------------------------------------------------------

      2 Success 2017-11-21 21:42:36.77 2017-11-21 21:42:40.396 COLUMNAR_V3 NA
      1 Streaming 2017-11-21 21:40:46.2 NULL ROW_V1 NA
      0 Success 2017-11-21 21:40:39.782 2017-11-21 21:40:43.168 COLUMNAR_V3 NA

      ---------------------------------------------------------------------------------------------

      Expected: Data should be loaded
      Actual : Data load fiails
      1. One addition offset file is created(marked in bold)
      rw-rr- 2 root users 62 2017-11-21 21:40 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/0
      rw-rr- 2 root users 63 2017-11-21 21:40 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/1
      rw-rr- 2 root users 63 2017-11-21 21:42 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/10
      rw-rr- 2 root users 63 2017-11-21 21:40 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/2
      rw-rr- 2 root users 63 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/3
      rw-rr- 2 root users 64 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/4
      rw-rr- 2 root users 64 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/5
      rw-rr- 2 root users 64 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/6
      rw-rr- 2 root users 64 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/7
      rw-rr- 2 root users 64 2017-11-21 21:41 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/8
      rw-rr- 2 root users 63 2017-11-21 21:42 /user/hive/warehouse/Ram/default/stream_table5/.streaming/checkpoint/offsets/9
      2. Following error thrown:
      === Streaming Query ===
      Identifier: [id = 3a5334bc-d471-4676-b6ce-f21105d491d1, runId = b2be9f97-8141-46be-89db-9a0f98d13369]
      Current Offsets:

      {org.apache.spark.sql.execution.streaming.TextSocketSource@14c45193: 1000}

      Current State: ACTIVE
      Thread State: RUNNABLE

      Logical Plan:
      org.apache.spark.sql.execution.streaming.TextSocketSource@14c45193

      at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:284)
      at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:177)
      Caused by: java.lang.RuntimeException: Offsets committed out of order: 20019 followed by 1000
      at scala.sys.package$.error(package.scala:27)
      at org.apache.spark.sql.execution.streaming.TextSocketSource.commit(socket.scala:151)
      at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$2$$anonfun$apply$mcV$sp$4.apply(StreamExecution.scala:421)
      at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$2$$anonfun$apply$mcV$sp$4.apply(StreamExecution.scala:420)
      at scala.collection.Iterator$class.foreach(Iterator.scala:893)
      at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
      at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
      at org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25)
      at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$2.apply$mcV$sp(StreamExecution.scala:420)
      at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$2.apply(StreamExecution.scala:404)
      at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$2.apply(StreamExecution.scala:404)
      at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:262)
      at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:46)
      at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch(StreamExecution.scala:404)
      at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$1.apply$mcV$sp(StreamExecution.scala:250)
      at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$1.apply(StreamExecution.scala:244)
      at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$1.apply(StreamExecution.scala:244)
      at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:262)
      at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:46)
      at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:244)
      at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:43)
      at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:239)
      ... 1 more
      Done reading and writing streaming data
      Socket closed

      Attachments

        Issue Links

          Activity

            People

              bhavya411 Bhavya Aggarwal
              Ram@huawei Ramakrishna S
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: