[SPARK-18150] Spark 2.* failes to create partitions for avro files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Invalid
Affects Version/s: None
Fix Version/s: None
Component/s: DStreams, SQL
Labels:
None

Target Version/s:

2.1.0

Description

I am using Apache Spark 2.0.1 for processing the Grid HDFS Avro file, however I don't see spark distributing the job into different tasks instead it uses single task and all the operations (read, load, filter, show ) are done in a sequence using same task.

This means I am not able to leverage distributed parallel processing.

I tried the same operation on JSON file on HDFS, it works good, means the job gets distributed into multiple tasks and partition. I see parallelism.

I then tested the same on Spark 1.6, there it does the partitioning. Looks like there is a bug in Spark 2.* version. If not can some one help me know how to achieve the same on Avro file, do I need to do something special for Avro files ?

Note:
I explored spark setting: "spark.default.parallelism", "spark.sql.files.maxPartitionBytes", "--num-executors" and "spark.sql.shuffle.partitions". These were not of much help. "spark.default.parallelism", ensured to have multiple tasks however a single task ended up performing all the operation.

I am using com.databricks.spark.avro (3.0.1) for Spark 2.0.1.

Thanks,
Sunil

Attachments

Sub-Tasks

1.	CLONE - MetadataLog should support purging old logs	Resolved	Peter Lee
2.	CLONE - FileStreamSource should not track the list of seen files indefinitely	Resolved	Peter Lee
3.	CLONE - Ability to remove old metadata for structure streaming MetadataLog	Resolved	Saisai Shao
4.	CLONE - Change Source API so that sources do not need to keep unbounded state	Resolved	Frederick Reiss
5.	CLONE - HDFSMetadataLog should not leak CRC files	Resolved	Unassigned
6.	CLONE - StreamExecution should discard unneeded metadata	Resolved	Frederick Reiss
7.	CLONE - Support purging aged file entry for FileStreamSource metadata log	Resolved	Unassigned

Activity

People

Assignee:: Unassigned

Reporter:: Sunil Kumar

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 28/Oct/16 07:06

Updated:: 28/Oct/16 09:59

Resolved:: 28/Oct/16 09:59