Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6720

[JAVA][C++]Support Parquet Read and Write in Java

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • 0.15.0
    • None
    • C++, Java
    • None

    Description

      We added a new java interface to support parquet read and write from hdfs or local file.

      The purpose of this implementation is that when we loading and dumping parquet data in Java, we can only use rowBased put and get methods. Since arrow already has C++ implementation to load and dump parquet, so we wrapped those codes as Java APIs.

      After test, we noticed in our workload, performance improved more than 2x comparing with rowBased load and dump. So we want to contribute codes to arrow.

      since this is a total independent change, there is no codes change to current arrow codes. We added two folders as listed:  java/adapter/parquet and cpp/src/jni/parquet

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              xuechendi Chendi.Xue
              Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 38.5h
                  38.5h