[KYLIN-3462] "dfs.replication=2" and compression not work in Spark cube engine - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: v2.3.0, v2.3.1, v2.4.0
Fix Version/s: v2.5.0
Component/s: Spark Engine
Labels:
None

Description

In a comparison between Spark and MR cubing, I noticed the cuboid files that Spark engine generated is 3x lager than MR, and took 4x larger more disk on HDFS than MR.

The reason is, the "dfs.replication=2" didn't work when Spark save to HDFS. And by default no compression for spark.

The converted HFiles are in the same size, the query results are the same. So this difference may easily be overlooked.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

cuboid_generated_by_mr.png
23/Jul/18 05:49
202 kB
Shao Feng Shi
cuboid_generated_by_spark.png
23/Jul/18 05:49
216 kB
Shao Feng Shi

Activity

People

Assignee:: Shao Feng Shi

Reporter:: Shao Feng Shi

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 23/Jul/18 05:47

Updated:: 17/Sep/18 00:59

Resolved:: 02/Aug/18 06:12