Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-3240

Performance Report CD vs parquet

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.5.1
    • Fix Version/s: None
    • Component/s: sql
    • Labels:
      None
    • Environment:
      3 node cluster, 32GB each, 8 core per machine. Install spark 2.3.2, hadoop and hive with Mysql.
    • Flags:
      Important

      Description

      Hi, 

      With report published on site its exciting to use CarbonData in our projects. 

      We did tpc-ds test on 100GB of data for both parquet and CarbonData, but the results are not upto the mark, on average carbon data is slower than parquet when we use getorCreateCarbonSession. We used 

      SparkSession spark = SparkSession.builder().config(sparkConf).appName("WritetocarbonData").enableHiveSupport().getOrCreate();
      SparkSession.Builder builder = SparkSession.builder().config(sparkConf).master("local").appName("WritetocarbonData")
      .config(sparkConf);
      SparkSession carbon = new CarbonSession.CarbonBuilder(builder).getOrCreateCarbonSession("/home/ec2-user/efs/mysql");

      We don't see CarbonData is performing @query level better than parquet or any significant difference.

      I would like to know how did you perform bench marking and results are better than Parquet.

      Latest ppts presented by Huwaie in one of China Conference, showcased CarbonData is 10x to 20x faster. 

      Can any one share the detailed bencmarking steps and code.

       

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              vnkesarwani@gmail.com Vinay
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: