Details
-
Task
-
Status: Closed
-
Minor
-
Resolution: Duplicate
-
None
-
None
-
None
-
None
Description
Ideas:
- [ ] Metadata table/table metadata
- [ ] File sizing
- [ ] Compaction, configs, strategies (Siva)
- [ ] Keys
- [ ] virtual keys
- [ ] meta fields (_hoodie_record_key and such)
- [ ] Document every single write operations :
- [ ] incremental: upsert, insert, delete
- [ ] batch ones : insert_overwrite, ...
- [ ] DeltaStreamer
- [ ] deployment
- [ ] sources
- [ ] de-duplication
- [ ] checkpoint provider
- [ ] Marker mechanism
- [ ] Timeline server and metaserver
- [ ] Indexing needs a page
- [ ] Table services - 1 page each
- [ ] bootstrap
- [ ] Document all the different utilities (try them out locally)
- [ ] multi/deltastreamer
- [ ] snapshotter
- [ ] exporter
- [ ] SchemaProvider
- [ ] Precommit validators SparkPreCommitValidator
- [ ] It’d be very helpful if there was a table of sources + schema providers, along with examples of the input format for the source in the Hudi documentation
- [ ] Hive sync
- [ ] Document prometheus reporter
- [ ] Document all public APIs (keygenerators, payloads, ... )
- [ ] Commit Notifications
- [ ] Address stuff from https://issues.apache.org/jira/browse/HUDI-1958