[ARROW-2249] [Java/Python] in-process vector sharing from Java to Python - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Java, Python
Labels:
- beginner

External issue URL:
https://github.com/apache/arrow/issues/18209

Description

Currently we seem to use in all applications of Arrow the IPC capabilities to move data between a Java process and a Python process. While this is 0-serialization, it is not zero-copy. By taking the address and offset, we can already create Python buffers from Java buffers: https://github.com/apache/arrow/pull/1693. This is still a very low-level interface and we should provide the user with:

A guide on how to load Apache Arrow java libraries in Python (either through a fat-jar that was shipped with Arrow or how he should integrate it into its Java packaging)
pyarrow.Array.from_jvm, pyarrow.RecordBatch.from_jvm, … functions that take the respective Java objects and emit Python objects. These Python objects should also ensure that the underlying memory regions are kept alive as long as the Python objects exist.

This issue can also be used as a tracker for the various sub-tasks that will need to be done to complete this rather large milestone.

Attachments

Issue Links

is a parent of

ARROW-2605 [Java/Python] Add unit test for pyarrow.timeX types in Array.from_jvm

Open

ARROW-2606 [Java/Python] Add unit test for pyarrow.decimal128 in Array.from_jvm

Open

ARROW-2607 [Java/Python] Support VarCharVector / StringArray in pyarrow.Array.from_jvm

Open

ARROW-2609 [Java/Python] Complex type conversion in pyarrow.Field.from_jvm

Open

ARROW-2610 [Java/Python] Add support for dictionary type to pyarrow.Field.from_jvm

Open

ARROW-2608 [Java/Python] Add pyarrow.{Array,Field}.from_jvm / jvm_buffer

Resolved

is blocked by

ARROW-2252 [Python] Create buffer from address, size and base

Resolved

is related to

ARROW-2604 [Java] Add method overload for VarCharVector.set(int,String)

Resolved

ARROW-12965 [Java] Java implementation of Arrow C data interface

Resolved

(1 is a parent of, 1 is blocked by, 2 is related to)

Activity

People

Assignee:: Unassigned

Reporter:: Uwe Korn

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 03/Mar/18 17:08

Updated:: 11/Jan/23 07:19