Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-1839

Data load failed when using compressed sort temp file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.3.0
    • None
    • None

    Description

      Carbondata provide an option to optimize data load process by compressing the intermediate sort temp files.

      The option is `carbon.is.sort.temp.file.compression.enabled` and its default value is `false`. In some disk tense scenario, user can turn on this feature by setting the option `true`, it will compress the file content before writing it to disk.

      How ever I have found bugs in the related code and the data load was failed after turning on this feature.

      This bug can be reproduced easily. I used the example from `TestLoadDataFrame` Line98.

      1. create a dataframe (e.g. 320000 rows with 3 columns)
      2. set carbon.is.sort.temp.file.compression.enabled=true in CarbonProperities
      3. write the dataframe to a carbontable through dataframewriter

      Error messages are shown as below:

      ```
      17/11/29 18:04:12 ERROR SortDataRows: SortDataRowPool:test1
      java.lang.ClassCastException: [B cannot be cast to [Ljava.lang.Integer;
      at org.apache.carbondata.core.util.NonDictionaryUtil.getDimension(NonDictionaryUtil.java:93)
      at org.apache.carbondata.processing.sort.sortdata.UnCompressedTempSortFileWriter.writeDataOutputStream(UnCompressedTempSortFileWriter.java:52)
      at org.apache.carbondata.processing.sort.sortdata.CompressedTempSortFileWriter.writeSortTempFile(CompressedTempSortFileWriter.java:65)
      at org.apache.carbondata.processing.sort.sortdata.SortTempFileChunkWriter.writeSortTempFile(SortTempFileChunkWriter.java:72)
      at org.apache.carbondata.processing.sort.sortdata.SortDataRows.writeSortTempFile(SortDataRows.java:245)
      at org.apache.carbondata.processing.sort.sortdata.SortDataRows.writeDataTofile(SortDataRows.java:232)
      at org.apache.carbondata.processing.sort.sortdata.SortDataRows.access$300(SortDataRows.java:45)
      at org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter.run(SortDataRows.java:426)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      ```
      ```
      17/11/29 18:04:13 ERROR SortDataRows: SafeParallelSorterPool:test1 exception occurred while trying to acquire a semaphore lock: Task org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40 rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
      17/11/29 18:04:13 ERROR ParallelReadMergeSorterImpl: SafeParallelSorterPool:test1
      org.apache.carbondata.processing.sort.exception.CarbonSortKeyAndGroupByException:
      at org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:173)
      at org.apache.carbondata.processing.loading.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.run(ParallelReadMergeSorterImpl.java:227)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.util.concurrent.RejectedExecutionException: Task org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40 rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
      at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
      at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
      at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
      at org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:169)
      ... 4 more
      ```
      ```
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.carbondata.processing.sort.exception.CarbonSortKeyAndGroupByException:
      at org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:173)
      at org.apache.carbondata.processing.loading.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.run(ParallelReadMergeSorterImpl.java:227)
      ... 3 more
      Caused by: java.util.concurrent.RejectedExecutionException: Task org.apache.carbondata.processing.sort.sortdata.SortDataRows$DataSorterAndWriter@3d413b40 rejected from java.util.concurrent.ThreadPoolExecutor@cb56011[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
      at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
      at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
      at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
      at org.apache.carbondata.processing.sort.sortdata.SortDataRows.addRowBatch(SortDataRows.java:169)
      ... 4 more
      ```

      Attachments

        Activity

          People

            xuchuanyin Chuanyin Xu
            xuchuanyin Chuanyin Xu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 19h 40m
                19h 40m