Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-8719

File group reader enhancement - Phase 0

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Blocker
    • Resolution: Unresolved
    • None
    • 1.0.1
    • None

    Attachments

      1.
      Simplify precombine and ordering field value handling in the file group reader Sub-task Patch Available Y Ethan Guo

      0%

      Original Estimate - 4h
      Remaining Estimate - 4h
      2.
      Ensure precombine/ordering fields can only be scalar Sub-task Open Lin Liu  
      3.
      Revisit stats generated in HoodieSparkFileGroupReaderBasedMergeHandle Sub-task Open Unassigned  
      4.
      Prevent user from setting precombine to field that is not in the table schema Sub-task Open Y Ethan Guo  
      5.
      Allow no precombine field in MOR table Sub-task In Progress Y Ethan Guo

      0%

      Original Estimate - 12h
      Remaining Estimate - 12h
      6.
      Followup to fix all callers to HoodieLogRecordReader to set the right value for max instant time Sub-task Open Unassigned  
      7.
      Default to overwrite merge mode if they don't specify an ordering field Sub-task Open Unassigned  
      8.
      HoodieSparkRecordMerger does not handle deletes based on the preCombine/ordering field Sub-task In Progress Lin Liu  
      9.
      Hoodie FilegroupReader cannot read Enums from MOR avro log blocks Sub-task Open Unassigned  
      10.
      Delete records are inconsistent depending on MOR/COW, Avro/Spark record merger, new filegroup reader enabled/disabled Sub-task Closed Lin Liu  
      11.
      Revalidate auto-keygen flow with file group reader Sub-task Open Unassigned

      0%

      Original Estimate - 4h
      Remaining Estimate - 4h
      12.
      Bridge gaps on delete handling behavior in the file group reader Sub-task Open Unassigned

      0%

      Original Estimate - 20h
      Remaining Estimate - 20h
      13.
      Revisit decimal handling for all parquet readers and writers to be consistent Sub-task Open Unassigned

      0%

      Original Estimate - 2h
      Remaining Estimate - 2h
      14.
      Revisit confs in Spark broadcast manager set for reader context Sub-task Open Unassigned

      0%

      Original Estimate - 1h
      Remaining Estimate - 1h
      15.
      hoodie.datasource.insert.dup.policy interplay with file group reader Sub-task Open Unassigned

      0%

      Original Estimate - 2h
      Remaining Estimate - 2h
      16.
      Add bootstrap read testing to TestHoodieFileGroupReaderBase Sub-task Open Y Ethan Guo  

      Activity

        People

          Unassigned Unassigned
          yihua Y Ethan Guo
          Votes:
          0 Vote for this issue
          Watchers:
          1 Start watching this issue

          Dates

            Created:
            Updated:

            Time Tracking

              Estimated:
              Original Estimate - 45h
              45h
              Remaining:
              Remaining Estimate - 45h
              45h
              Logged:
              Time Spent - Not Specified
              Not Specified