Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-12130

PutIceberg: Ability to configure snapshot properties via dynamic attributes

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.0.0-M1, 1.24.0
    • Extensions

    Description

      Motivation

      Spark's implementation of Iceberg allows users to add snapshot properties, when writing data to an Iceberg table, using properties prefixed with "snapshot-property." like so:

      df.write
        .option("write-format", "avro")
        .option("snapshot-property.key", "value")
        .insertInto("catalog.db.table") 

      https://iceberg.apache.org/docs/latest/spark-configuration/#write-options

      These properties can be used to add context to Iceberg snapshots and help users locate snapshots in recovery scenarios.

      In fact, Spark automatically adds the application name as spark.app.id.

      Examples of when these properties might be useful include:

      • Recording the data source used to produce the new records
      • UUID of flow file used to update the table so it can be matched to NiFi provenance

      They can be queried from the snapshots metatable (feature of Iceberg).

      Feature request

      It would be great if we could configure PutIceberg to add these properties in a similar fashion (e.g. using dynamic properties of the form snapshot-property.*). Continuing with the comparison to Spark, it may also be worth automatically adding the flowfile UUID as something like nifi.flowfile.id.

      Further details

      I'm not entirely clued up on the Iceberg API, but it looks like these are set on the SnapshotUpdate (AppendFiles inherits from this class):

      https://iceberg.apache.org/javadoc/master/org/apache/iceberg/SnapshotUpdate.html

      Attachments

        Issue Links

          Activity

            People

              mbathori Mark Bathori
              wdyson William Dyson
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m