Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-43226

Define extractors for file-constant metadata columns

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.5.0
    • Spark Core
    • None

    Description

      File-source constant metadata columns are often derived indirectly from file-level metadata values rather than exposing those values directly. For example, _metadata.file_name is currently hard-coded in FileFormat.updateMetadataInternalRow as:

       

      UTF8String.fromString(filePath.getName)

       

      We should add support for metadata extractors, functions that map from PartitionedFile to Literal, so that we can express such columns in a generic way instead of hard-coding them.

      We can't just add them to the metadata map because then they have to be pre-computed even if it turns out the query does not select that field.

      Attachments

        Activity

          People

            ryan.johnson@databricks.com Ryan Johnson
            ryan.johnson@databricks.com Ryan Johnson
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: