[HUDI-110] Better defaults for Partition extractor for Spark DataSource and DeltaStreamer - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.13.0
Component/s: deltastreamer, spark, Usability
Labels:
- user-support-issues

Description

Currently

SlashEncodedDayPartitionValueExtractor is the default being used. This is not a common format outside Uber.

Also, Spark DataSource provides partitionedBy clauses which has not been integrated for Hudi Data Source. We need to investigate how we can leverage partitionBy clause for partitioning.

Attachments

Issue Links

is depended upon by

HUDI-901 Bug Bash 0.6.0 Tracking Ticket

Resolved

is fixed by

HUDI-4474 infer metasync configs from original configs

Closed

links to

GitHub Pull Request #1643

Activity

People

Assignee:: Unassigned

Reporter:: Balaji Varadarajan

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 03/May/19 21:14

Updated:: 20/Aug/22 23:25

Resolved:: 20/Aug/22 23:21