Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-3462

"dfs.replication=2" and compression not work in Spark cube engine

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: v2.3.0, v2.3.1, v2.4.0
    • Fix Version/s: v2.5.0
    • Component/s: Spark Engine
    • Labels:
      None

      Description

      In a comparison between Spark and MR cubing, I noticed the cuboid files that Spark engine generated is 3x lager than MR, and took 4x larger more disk on HDFS than MR.

       

      The reason is, the "dfs.replication=2" didn't work when Spark save to HDFS. And by default no compression for spark.

       

      The converted HFiles are in the same size, the query results are the same. So this difference may easily be overlooked.   

        Attachments

        1. cuboid_generated_by_mr.png
          202 kB
          Shao Feng Shi
        2. cuboid_generated_by_spark.png
          216 kB
          Shao Feng Shi

          Activity

            People

            • Assignee:
              shaofengshi Shao Feng Shi
              Reporter:
              shaofengshi Shao Feng Shi
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: