[SPARK-4911] Report the inputs and outputs of Spark jobs so that external systems can track data lineage - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 1.2.0
Fix Version/s: None
Component/s: Spark Core
Labels:
- bulk-closed

Description

When Spark runs a job, it would be useful to log its filesystem inputs and outputs somewhere. This allows external tools to track which persisted datasets are derived from other persisted datasets.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Sandy Ryza

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 20/Dec/14 17:55

Updated:: 21/May/19 05:37

Resolved:: 21/May/19 05:37