Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
1.2.0
-
None
Description
My testing env and configs are as followings:
Env:
6 executors, 9G mem + 6 cores per executor
Configs:
SINGLE_PASS=true
SORT_SCOPE=GLOBAL_SORT
spark.memory.fraction=0.5
if using 'convertRDD.persist(StorageLevel.MEMORY_AND_DISK_SER)' in method 'org.apache.carbondata.spark.load.DataLoadProcessBuilderOnSpark.loadDataUsingGlobalSort', it takes about 7.2 min to load 144136697 lines (10.9 G parquet files), and if using 'convertRDD.persist(StorageLevel.MEMORY_AND_DISK)', it takes about 9.5 min to load 144136697 lines.
Attachments
Issue Links
- links to