[HUDI-1267] Additional Metadata Details for Hudi Transactions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.9.0
Fix Version/s: None
Component/s: Usability, writer-core
Labels:
None

Description

Whenever following scenarios happen :

Custom Datasource ( Kafka for instance ) -> Hudi Table
Hudi -> Hudi Table
s3 -> Hudi Table

Following metadata need to be captured :

Table Level Metadata

- Operation name ( record level ) like Upsert, Insert etc for last operation performed on the row

Transaction Level Metadata ( This will be logged on Hudi Level and not Table Level )
- Source ( Kafka Topic Name / S3 url for source data in case of s3 etc )
- Target Hudi Table Name
- Last transaction time ( last commit time )

Basically , point (1) collects all details on table level and point (2) collects all the transactions happened on Hudi Level

Point(1) would be just a column addition for operation type

Eg for Point (2) : Suppose we had an ingestion from Kafka topic 'A' to Hudi table 'ingest_kafka' and another ingestion from RDBMS table ( 'tableA' ) through Sqoop to Hudi Table 'RDBMSingest' then the metadata captured would be :

Source	Timestamp	Transaction Type	Target
Kafka - 'A'	XXXXXX	UPSERT	ingest_kafka
RDBMS - 'tableA'	XXXXXX	INSERT	RDBMSingest

The Transaction Details Table in Point (2) should be available as a separate common table which can be queried as Hudi Table or stored as parquet which can be queried from Spark

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Ashish M G

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 03/Sep/20 11:41

Updated:: 10/Mar/23 01:50