Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-431

Support Parquet in MOR log files

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • 0

    Description

      We have a basic implementation of inline filesystem, to read a file format like Parquet, embedded "inline" into another file.  

      https://github.com/apache/hudi/blob/master/hudi-common/src/test/java/org/apache/hudi/common/fs/inline/TestInLineFileSystem.java for sample usage.

       This idea here is to see if we can embed parquet/hfile formats into the Hudi log files, to get columnar reads on the delta log files as well. This helps us speed up query performance, given the log is row based today. Once Inline FS is available, enable parquet logging support with HoodieLogFile. LogFile can expose a writer (essentially ParquetWriter) and users can write records as though writing to parquet files. Similarly on the read path, a reader (parquetReader) will be exposed which the user can use to read data out of it. 

      This Jira tracks work to implement such parquet inlining into the log format and have the writer and reader use it. 

       

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            alexey.kudinkin Alexey Kudinkin
            shivnarayan sivabalan narayanan
            Vinoth Chandar
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 4h
                4h
                Remaining:
                Remaining Estimate - 4h
                4h
                Logged:
                Time Spent - Not Specified
                Not Specified

                Agile

                  Completed Sprints:
                  Hudi-Sprint-Jan-3 ended 29/Dec/21
                  Hudi-Sprint-Jan-3 ended 11/Jan/22
                  Hudi-Sprint-Jan-10 ended 19/Jan/22
                  Hudi-Sprint-Jan-18 ended 25/Jan/22
                  Hudi-Sprint-Jan-24 ended 01/Feb/22
                  Hudi-Sprint-Jan-31 ended 08/Feb/22
                  View on Board

                  Slack

                    Issue deployment