Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1292

[Umbrella] RFC-15 : File Listing and Query Planning Optimizations

    XMLWordPrintableJSON

Details

    Description

      This is the umbrella ticket that tracks the overall implementation of RFC-15

      Attachments

        Issue Links

          1.
          Move static invocations of HoodieMetadata.xxx to HoodieTable Sub-task Closed Prashant Wason  
          2.
          Support properties for metadata table via a properties.file Sub-task Closed Prashant Wason  
          3.
          Fix initialization when Async jobs are scheduled - these jobs have older timestamp than INIT timestamp on metadata table Sub-task Closed Prashant Wason  
          4.
          FSUtils: rename:isMetadataTable Sub-task Closed Prashant Wason  
          5.
          Fence metadata reads using latest data timeline commit times! Sub-task Resolved Vinoth Chandar  
          6.
          Refactor into Reader & Writer side for Metadata Sub-task Closed Prashant Wason  
          7.
          Fix clean and Asyn Clean when metadata table is enabled Sub-task Closed Prashant Wason  
          8.
          Check if MergedBlockReader will neglect log blocks based on uncommitted commits. Sub-task Resolved Vinoth Chandar  
          9.
          Make Async Operations (Clean, Compaction, Replace) work with metadata table Sub-task Resolved Vinoth Chandar  
          10.
          Implement in-memory merging of metadata table with the non-synced part of data timeline Sub-task Resolved Ryan Pifer  
          11.
          Faster initialization for larger datasets Sub-task Resolved Prashant Wason

          0%

          Original Estimate - 4h
          Remaining Estimate - 4h
          12.
          Hive use of Metadata Table Sub-task Resolved Ryan Pifer  
          13.
          Support for handling of REPLACE instants Sub-task Resolved Vinoth Chandar  
          14.
          Support metadata based file listing with HoodieROTablePathFilter Sub-task Resolved Udit Mehrotra  
          15.
          Ensure all instances of FSUtils.getAllPartitionsPaths() are replaced with calls to metadata table Sub-task Resolved Udit Mehrotra  
          16.
          Allow log files added as a part of restore to be synced to metadata table Sub-task Resolved Vinoth Chandar  
          17.
          Replace FSUtils.getAllPartitionPaths() with HoodieTableMetadata#getAllPartitionPaths() Sub-task Resolved Udit Mehrotra  
          18.
          Restore on MOR table leaves metadata table out-of-sync from data table Sub-task Resolved Vinoth Chandar  
          19.
          Move metadata syncing to a preWrite() method, away from WriteClient constructor Sub-task Resolved Vinoth Chandar  
          20.
          Support file listing using metadata for Spark DataSource and Spark SQL queries Sub-task Resolved Udit Mehrotra  
          21.
          Presto use of Metadata Table for file listings Sub-task In Progress Sagar Sumit  
          22.
          RFC-15: Track range metadata as a part of metadata table Sub-task Resolved Unassigned  
          23.
          Implement inlining of HFile Data Blocks in metadata table log Sub-task Resolved sivabalan narayanan

          0%

          Original Estimate - 4h
          Remaining Estimate - 4h
          24.
          Follow on code improvements to HFile tables Sub-task Open sivabalan narayanan  
          25.
          Listing Metadata unreadable in S3 as the log block is deemed corrupted Sub-task Resolved Nishith Agarwal  
          26.
          Scoping work needed to support bootstrap and RFC-15 together Sub-task Open Vinoth Chandar  
          27.
          Enhance DeltaWriteStat with block level metadata correctly for storage schemes that support appends Sub-task In Progress Manoj Govindassamy  
          28.
          Spark-SQL drvier runs out of memory when metadata table is enabled Sub-task Resolved Udit Mehrotra  
          29.
          Move validation of file listings to something that happens before each write Sub-task Resolved Prashant Wason  
          30.
          Fix Flaky test : TestHoodieMetadata#testSync Sub-task Resolved sivabalan narayanan  
          31.
          Improve performance of key lookups from base file (HFile) in Metadata table Sub-task Closed Prashant Wason  
          32.
          Allow directories to be filtered during the bootstrap of the metadata table Sub-task Closed Prashant Wason  
          33.
          Handle the case metadata table cannot be synced due to instants being archived Sub-task Closed Prashant Wason  
          34.
          Bugs with Metadata Table in 0.7 release Sub-task Closed Prashant Wason  
          35.
          Audit and remove references of fs.listStatus() and fs.getFileStatus() or fs.exists() Sub-task Reopened Manoj Govindassamy  
          36.
          Supporting Clustering and Metadata Table together Sub-task Resolved Prashant Wason  
          37.
          Metadata Table Synchronous Design Sub-task Resolved Prashant Wason  
          38.
          Make metadata tests lean and consistent Sub-task Resolved Sagar Sumit  
          39.
          Add rollback plan and rollback requested instant Sub-task Resolved sivabalan narayanan

          0%

          Original Estimate - 48h
          Remaining Estimate - 48h
          40.
          Fix restore by adding a requested instant and restore plan Sub-task Open Vinoth Chandar  
          41.
          rollback in cloud stores w/o append, wrt collecting failed log files to be deleted/logged Sub-task Closed sivabalan narayanan  
          42.
          Fix missing files as part of clean metadata or rollback metadata if retried after failed Sub-task Resolved sivabalan narayanan  
          43.
          Relax compaction in metadata being fenced based on inflight requests in data table Sub-task Open sivabalan narayanan  
          44.
          Support async compaction for metadata table Sub-task Open sivabalan narayanan  
          45.
          Async cleaning with metadata table Sub-task Open sivabalan narayanan  
          46.
          Support lock free multi-writer for metadata table Sub-task Open sivabalan narayanan  
          47.
          Fix rollback of first commit after being synced to metadata table Sub-task Resolved Manoj Govindassamy  
          48.
          Tests failure follow up when metadata is enabled by default Sub-task Resolved Manoj Govindassamy  
          49.
          Fix refreshing timeline for every operation Sub-task Resolved sivabalan narayanan  
          50.
          Fix retried compaction commit in datatable fails when applied to metadata w/ sync updates Sub-task Resolved sivabalan narayanan  
          51.
          Restore fails after adding rollback plan and rollback.requested instant w/ metadata enabled Sub-task Open Vinoth Chandar  
          52.
          Handle failure mid-way during init buckets Sub-task Resolved Vinoth Chandar  
          53.
          Fix Restore and RollbackMetadata in HoodieTestTable Sub-task Open Sagar Sumit  
          54.
          Add synchronous metadata support to flink Sub-task Open Unassigned  
          55.
          Rolling Upgrade downgrade story for 0.10 & enabling metadata Sub-task Resolved Manoj Govindassamy  
          56.
          Support bootstrapping a single or more partitions in metadata table while regular writers and table services are in progress Sub-task In Progress Vinoth Chandar  
          57.
          Fix usage of different key generators with metadata enabled Sub-task Resolved sivabalan narayanan  
          58.
          Enable Metadata Table by default for both writers and readers Sub-task Resolved sivabalan narayanan  
          59.
          Verify synchronous metadata patch w/ multi writers end to end Sub-task Resolved sivabalan narayanan  
          60.
          Deadlock w/ multi writer due to double locking Sub-task Resolved sivabalan narayanan

          0%

          Original Estimate - 24h
          Remaining Estimate - 24h
          61.
          Re-write RFC for file listing w/ synchronous metadata patch Sub-task Open sivabalan narayanan  
          62.
          Double bootstrap of metadata table when upgrade is involved Sub-task Resolved Prashant Wason  
          63.
          Virtual keys support for metadata table Sub-task Resolved Manoj Govindassamy  
          64.
          Enable metadata by default for readers Sub-task Patch Available sivabalan narayanan  
          65.
          Fix usages of RealtimeSplit to use the new getDeltaLogFileStatus Sub-task Open sivabalan narayanan  
          66.
          Improve bootstrap performance for very large tables Sub-task Resolved Prashant Wason  
          67.
          Test and certify parquet bootstrap with metadata table Sub-task Open Manoj Govindassamy  
          68.
          Guard all writes to metadata table for a single writer datatable and async operations Sub-task Resolved sivabalan narayanan  
          69.
          Test bootstrap with metadata enabled Sub-task Resolved Manoj Govindassamy  
          70.
          Non partitioned dataset with metadata fails Sub-task Closed sivabalan narayanan  
          71.
          async compaction failing with timeline mismatches between server and client when metadata is enabled Sub-task Closed sivabalan narayanan  
          72.
          Avoid fs.exists() and fs.mkdirs() call to partitions in AbstractTablefileSystemView Sub-task Closed Sagar Sumit  
          73.
          Upgrade HBase to 2.x Sub-task Patch Available Sagar Sumit  
          74.
          TestMereIntoLogOnlyTable with metadata enabled surfaces likely bug Sub-task Resolved Prashant Wason  
          75.
          spark on hudi metadata key length < 0 and file not found error Sub-task Closed Manoj Govindassamy  
          76.
          Support boostrapping of metadata table even when async table service is in progress Sub-task Open Vinoth Chandar  
          77.
          rollback of a partially failed commit which has new partitions fails with metadata table Sub-task Resolved sivabalan narayanan  
          78.
          Validate metadata config for all readers Sub-task Resolved Sagar Sumit  
          79.
          Test and certify inline file system in S3 and hdfs Sub-task Resolved Manoj Govindassamy  
          80.
          Remove fs.exists() in AbstractTableFileSystemView Sub-task Closed Unassigned  
          81.
          Parsing of metadata compaction timestamp fails when metadata is enabled Sub-task Resolved sivabalan narayanan  
          82.
          Fix hudi cli metadata commands Sub-task Open Unassigned  
          83.
          Metadata table enters into inconsistent state Sub-task Resolved sivabalan narayanan  
          84.
          Get Metadata table bootstrapping in Flink in parity with spark Sub-task Open Danny Chen  

          Activity

            People

              pwason Prashant Wason
              vinoth Vinoth Chandar
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - 80h
                  80h
                  Remaining:
                  Remaining Estimate - 80h
                  80h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified