Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-3240

Performance Report CD vs parquet

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.5.1
    • None
    • sql
    • None
    • 3 node cluster, 32GB each, 8 core per machine. Install spark 2.3.2, hadoop and hive with Mysql.
    • Important

    Description

      Hi, 

      With report published on site its exciting to use CarbonData in our projects. 

      We did tpc-ds test on 100GB of data for both parquet and CarbonData, but the results are not upto the mark, on average carbon data is slower than parquet when we use getorCreateCarbonSession. We used 

      SparkSession spark = SparkSession.builder().config(sparkConf).appName("WritetocarbonData").enableHiveSupport().getOrCreate();
      SparkSession.Builder builder = SparkSession.builder().config(sparkConf).master("local").appName("WritetocarbonData")
      .config(sparkConf);
      SparkSession carbon = new CarbonSession.CarbonBuilder(builder).getOrCreateCarbonSession("/home/ec2-user/efs/mysql");

      We don't see CarbonData is performing @query level better than parquet or any significant difference.

      I would like to know how did you perform bench marking and results are better than Parquet.

      Latest ppts presented by Huwaie in one of China Conference, showcased CarbonData is 10x to 20x faster. 

      Can any one share the detailed bencmarking steps and code.

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            vnkesarwani@gmail.com Vinay
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: