[ARROW-9293] [R] Add chunk_size to Table$create() - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: R
Labels:
None

External issue URL:
https://github.com/apache/arrow/issues/25384

Description

While working on ~~ARROW-3308~~, I noticed that write_feather has a chunk_size argument, which by default will write batches of 64k rows into the file. In principle, a chunking strategy like this would prevent the need to bump up to large_utf8 when ingesting a large character vector because you'd end up with many chunks that each fit into a regular utf8 type. However, the way the function works, the data.frame is converted to a Table with all ChunkedArrays containing a single chunk first, which is where the large_utf8 type gets set. But if Table$create() could be instructed to make multiple chunks, this would be resolved.

Attachments

Issue Links

is related to

ARROW-8470 [Python][R] Expose incremental write API for Feather files

Open

ARROW-10570 [R] Use Converter API to convert SEXP to Array/ChunkedArray

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Neal Richardson

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 01/Jul/20 22:51

Updated:: 11/Jan/23 08:05