Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-4106

Compaction is not working properly

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Not A Bug
    • Affects Version/s: 2.0.1
    • Fix Version/s: None
    • Component/s: core
    • Labels:
      None
    • Environment:
      Apache spark 2.4.5, carbonData 2.0.1

      Description

      Hi Team,

      We are using apache carbondata 2.0.1 for one of our POC and we observed that we are not getting proper benifit from using compaction (Both majour and minor).

      Please find below details for the issue we are facing:

      Name of the table used:  fact_365_1_probe_1

      +Number of rows:
      +
      select count from fact_365_1_probe_1
      --------

      count(1)

      --------

      76963753

      Sample data from the table:
      ======================

      -------------------------------------------------------------------------------------------------------------------------+

      ts metric tags_id value epoch ts2

      -------------------------------------------------------------------------------------------------------------------------+

      2021-01-07 21:05:00 Probe.Duplicate.Poll.Count c8dead9b-87ae-46ae-8703-bc2b7bfba5d4 39.611356797970274 1610033757768 2021-01-07 00:00:00
      2021-01-07 23:50:00 Probe.Duplicate.Poll.Count 62351ef2-f2ce-49d1-a2fd-a0d1e5f6a1b9 72.70658115131307 1610043742516 2021-01-07 00:00:00

       
      describe_fact_probe_1
       
      I have attached  the describe output which will show you the other details of the table.

      The size of the table is 3.24 GB and even after running minor or majour compaction the size remain almost the same.

      So we re not getting any benifit by running the compaction.Could you please review the shared details and help us in identifying if we are missing something here or is there any bug?

      Also we need answer to the following questions about carbondata storate:

      1. In case of decimal values, how the storage behaves like if i have one row with 20 digits after decimal and second row has only 5 digits  after decimal so how and what would be the difference in the storage taken.

      2. My second question is , if i have two tables and one of the table has same values for 100 rows and other table has different values for 100 rows so how carbon will behave as far as the storage is concerned in this scenario. WHich table will take less storage or both will take same storage.

      3.Also for string datatype could you please describe what is the storage defined for string datatype.
       

      ================

        Attachments

        1. describe_fact_probe_1
          6 kB
          suyash yadav

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              imsuyash suyash yadav
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: