Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5224

[Java] Add APIs for supporting directly serialize/deserialize ValueVector

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Won't Do
    • None
    • None
    • Java

    Description

      There is no API to directly serialize/deserialize ValueVector. The only way to implement this is to put a single FieldVector in VectorSchemaRoot and convert it to ArrowRecordBatch, and the deserialize process is as well. 

      Provide a utility class to implement this may be better, I know all serializations should follow IPC format so that data can be shared between different Arrow implementations. But for users who only use Java API and want to do some further optimization, this seem to be no problem and we could provide them a more option.

      This may take some benefits for Java user who only use ValueVector rather than IPC series classes such as ArrowReordBatch:

      • We could do some shuffle optimization such as compression and some encoding algorithm for numerical type which could greatly improve performance.
      • Do serialize/deserialize with the actual buffer size within vector since the buffer size is power of 2 which is actually bigger than it really need.
      • Reduce data conversion(VectorSchemaRoot, ArrowRecordBatch etc) to make it user-friendly.

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tianchen92 Ji Liu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h
                  3h