Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-538

[UMBRELLA] Restructuring hudi client module for multi engine support

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Code Cleanup
    • Labels:

      Description

      Hudi is currently tightly coupled with the Spark framework. It caused the integration with other computing engine more difficult. We plan to decouple it with Spark. This umbrella issue used to track this work.

        Attachments

          Issue Links

          1.
          Introduce a new pom module named hudi-writer-common Sub-task Resolved vinoyang

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          2.
          Restructure code/packages to move more code back into hudi-writer-common Sub-task Closed Vinoth Chandar

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          3.
          Decouple HoodieReadClient and AbstractHoodieClient to break the inheritance chain Sub-task Resolved vinoyang

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          4.
          Make ClientUtils spark-free Sub-task New vinoyang  
          5.
          Make HoodieCommitArchiveLog spark free Sub-task New vinoyang  
          6.
          Make HoodieWriteConfig spark free Sub-task Closed vinoyang  
          7.
          Make EmbeddedTimelineService spark free Sub-task New vinoyang  
          8.
          Remove unnecessary use of spark in savepoint timeline Sub-task Closed hong dongdong

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          9.
          Make config package spark free Sub-task Closed leesf

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          10.
          Make io package spark free Sub-task Closed leesf

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          11.
          Remove the dependency of EmbeddedTimelineService from HoodieReadClient Sub-task Closed hong dongdong

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          12.
          Move HoodieReadClient into hudi-spark module Sub-task New vinoyang  
          13.
          Prototype classes/abstractions to encapsule SparkContext and RDD Sub-task New vinoyang  
          14.
          Explore support for plugging in Spark data source V2 Sub-task New Vinoth Chandar  
          15.
          Replace JavaSparkContext/SQLContext with SparkSession Sub-task Open lamber-ken  
          16.
          Make AbstractHoodieClient spark-free Sub-task New hong dongdong  
          17.
          Make CompactionAdminClient spark-free Sub-task New hong dongdong

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 10m
          18.
          Ability to do small file handling without need for caching Sub-task In Progress sivabalan narayanan  
          19.
          Some classes that rely on JavaSparkContext to obtain Configuration can make them Spark-free. Sub-task Resolved shenh062326  
          20.
          Remove spark context in ClientUtils and HoodieIndex Sub-task Closed shenh062326  
          21.
          Replace part of spark context by hadoop configuration in HoodieTable Sub-task Closed shenh062326  
          22.
          Replace part of Spark context by hadoop configuration in AbstractHoodieClient Sub-task Closed shenh062326  
          23.
          Replace jsc.hadoopConfiguration by hadoop configuration in hudi-client testcase Sub-task Closed shenh062326  

            Activity

              People

              • Assignee:
                yanghua vinoyang
                Reporter:
                yanghua vinoyang
              • Votes:
                0 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2.5h
                  2.5h