Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-1032

NumberFormatException and NegativeArraySizeException for select with in clause filter limit for unsafe true configuration

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.1.0
    • 1.3.0
    • data-query
    • None
    • 3 node cluster SUSE 11 SP4

    Description

      Carbon .properties are configured as below:
      carbon.allowed.compaction.days = 2
      carbon.enable.auto.load.merge = false
      carbon.compaction.level.threshold = 3,2
      carbon.timestamp.format = yyyy-MM-dd
      carbon.badRecords.location = /tmp/carbon
      carbon.numberof.preserve.segments = 2
      carbon.sort.file.buffer.size = 20
      max.query.execution.time = 60
      carbon.number.of.cores.while.loading = 8
      carbon.storelocation =hdfs://hacluster/opt/CarbonStore
      enable.data.loading.statistics = true
      enable.unsafe.sort = true
      offheap.sort.chunk.size.inmb = 128
      sort.inmemory.size.inmb = 30720
      carbon.enable.vector.reader=true
      enable.unsafe.in.query.processing=true
      enable.query.statistics=true
      carbon.blockletgroup.size.in.mb=128
      high.cardinality.identify.enable=TRUE
      high.cardinality.threshold=10000
      high.cardinality.value=1000
      high.cardinality.row.count.percentage=40
      carbon.data.file.version=2
      carbon.major.compaction.size=2
      carbon.enable.auto.load.merge=FALSE
      carbon.numberof.preserve.segments=1
      carbon.allowed.compaction.days=1

      User creates table, loads 1535088 records data and executes the select with in clause filter limit.

      Actual Result :
      NumberFormatException and NegativeArraySizeException for select with in clause filter limit for unsafe true configuration.
      0: jdbc:hive2://172.168.100.199:23040> select * from flow_carbon_test4 where opp_bk in ('1491999999158','1491999999116','1491999999022','1491999999031') and dt>='20140101' and dt <= '20160101' order by bal asc limit 1000;
      Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 2109.0 failed 4 times, most recent failure: Lost task 1.3 in stage 2109.0 (TID 75120, linux-49, executor 2): java.lang.NegativeArraySizeException
      at org.apache.carbondata.core.datastore.chunk.store.impl.unsafe.UnsafeBigDecimalMeasureChunkStore.getBigDecimal(UnsafeBigDecimalMeasureChunkStore.java:132)
      at org.apache.carbondata.core.datastore.compression.decimal.CompressByteArray.getBigDecimalValue(CompressByteArray.java:94)
      at org.apache.carbondata.core.datastore.dataholder.CarbonReadDataHolder.getReadableBigDecimalValueByIndex(CarbonReadDataHolder.java:38)
      at org.apache.carbondata.core.scan.result.vector.MeasureDataVectorProcessor$DecimalMeasureVectorFiller.fillMeasureVectorForFilter(MeasureDataVectorProcessor.java:253)
      at org.apache.carbondata.core.scan.result.impl.FilterQueryScannedResult.fillColumnarMeasureBatch(FilterQueryScannedResult.java:119)
      at org.apache.carbondata.core.scan.collector.impl.DictionaryBasedVectorResultCollector.scanAndFillResult(DictionaryBasedVectorResultCollector.java:145)
      at org.apache.carbondata.core.scan.collector.impl.DictionaryBasedVectorResultCollector.collectVectorBatch(DictionaryBasedVectorResultCollector.java:137)
      at org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.processNextBatch(DataBlockIteratorImpl.java:65)
      at org.apache.carbondata.core.scan.result.iterator.VectorDetailQueryResultIterator.processNextBatch(VectorDetailQueryResultIterator.java:46)
      at org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextBatch(VectorizedCarbonRecordReader.java:251)
      at org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextKeyValue(VectorizedCarbonRecordReader.java:141)
      at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:221)
      at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source)
      at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
      at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
      at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
      at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
      at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
      at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:628)
      at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
      at org.apache.spark.sql.execution.TakeOrderedAndProjectExec$$anonfun$5.apply(limit.scala:148)
      at org.apache.spark.sql.execution.TakeOrderedAndProjectExec$$anonfun$5.apply(limit.scala:147)
      at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
      at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
      at org.apache.spark.scheduler.Task.run(Task.scala:99)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

      Driver stacktrace: (state=,code=0)
      0: jdbc:hive2://172.168.100.199:23040> select * from flow_carbon_test4 where cus_ac like '622262135067246539%' and (txn_dte>='20150101' and txn_dte<='20160101') and txn_bk IN ('00000000000', '00000000001','00000000002') OR own_bk IN ('00000000424','00000001383','00000001942','00000001262') limit 1000;
      Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 131.0 failed 4 times, most recent failure: Lost task 0.3 in stage 131.0 (TID 240, linux-51, executor 1): java.lang.NumberFormatException: Zero length BigInteger
      at java.math.BigInteger.<init>(BigInteger.java:293)
      at org.apache.carbondata.core.util.DataTypeUtil.byteToBigDecimal(DataTypeUtil.java:189)
      at org.apache.carbondata.core.datastore.chunk.store.impl.unsafe.UnsafeBigDecimalMeasureChunkStore.getBigDecimal(UnsafeBigDecimalMeasureChunkStore.java:136)
      at org.apache.carbondata.core.datastore.compression.decimal.CompressByteArray.getBigDecimalValue(CompressByteArray.java:94)
      at org.apache.carbondata.core.datastore.dataholder.CarbonReadDataHolder.getReadableBigDecimalValueByIndex(CarbonReadDataHolder.java:38)
      at org.apache.carbondata.core.scan.collector.impl.AbstractScannedResultCollector.getMeasureData(AbstractScannedResultCollector.java:104)
      at org.apache.carbondata.core.scan.collector.impl.AbstractScannedResultCollector.fillMeasureData(AbstractScannedResultCollector.java:78)
      at org.apache.carbondata.core.scan.collector.impl.DictionaryBasedResultCollector.fillMeasureData(DictionaryBasedResultCollector.java:158)
      at org.apache.carbondata.core.scan.collector.impl.DictionaryBasedResultCollector.collectData(DictionaryBasedResultCollector.java:115)
      at org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.next(DataBlockIteratorImpl.java:51)
      at org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.next(DataBlockIteratorImpl.java:32)
      at org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.getBatchResult(DetailQueryResultIterator.java:50)
      at org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.next(DetailQueryResultIterator.java:41)
      at org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.next(DetailQueryResultIterator.java:31)
      at org.apache.carbondata.core.scan.result.iterator.ChunkRowIterator.<init>(ChunkRowIterator.java:41)
      at org.apache.carbondata.hadoop.CarbonRecordReader.initialize(CarbonRecordReader.java:78)
      at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:204)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
      at org.apache.spark.sql.CarbonDecoderRDD.compute(CarbonDictionaryDecoder.scala:538)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
      at org.apache.spark.scheduler.Task.run(Task.scala:99)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

      Driver stacktrace: (state=,code=0)

      Expected Result : select with in clause filter limit for unsafe true configuration should execute successfully displaying correct result set without exception.

      Attachments

        Activity

          People

            Unassigned Unassigned
            chetdb Chetan Bhat
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 504h
                504h
                Remaining:
                Remaining Estimate - 504h
                504h
                Logged:
                Time Spent - Not Specified
                Not Specified