Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
None
-
None
Description
This is a usability or user-friendliness issue.
It's extremely common for people to load a text file compressed with gzip, process it, and then wonder why only 1 core in their cluster is doing any work.
Some examples:
I'm not sure how this problem can be generalized, but at the very least it would be helpful if Spark displayed some kind of warning in the common case when someone opens a gzipped file with sc.textFile.
Attachments
Issue Links
- is duplicated by
-
SPARK-28366 Logging in driver when loading single large unsplittable file
- Resolved
- relates to
-
SPARK-29102 Read gzipped file into multiple partitions without full gzip expansion on a single-node
- Resolved