Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4911

Report the inputs and outputs of Spark jobs so that external systems can track data lineage

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 1.2.0
    • None
    • Spark Core

    Description

      When Spark runs a job, it would be useful to log its filesystem inputs and outputs somewhere. This allows external tools to track which persisted datasets are derived from other persisted datasets.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sandyr Sandy Ryza
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: