[ARROW-2119] [C++][Java] Handle Arrow stream with zero record batch - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.14.0
Component/s: C++, Java
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/18090

Description

It looks like currently many places of the code assume that there needs to be at least one record batch for streaming format. Is zero-recordbatch not supported by design?

e.g. https://github.com/apache/arrow/blob/master/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java#L45

  public static void convert(InputStream in, OutputStream out) throws IOException {
    BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
    try (ArrowStreamReader reader = new ArrowStreamReader(in, allocator)) {
      VectorSchemaRoot root = reader.getVectorSchemaRoot();
      // load the first batch before instantiating the writer so that we have any dictionaries
      if (!reader.loadNextBatch()) {
        throw new IOException("Unable to read first record batch");
      }
      ...

Pyarrow-0.8.0 does not load 0-recordbatch stream either. It would throw an exception originated from https://github.com/apache/arrow/blob/a95465b8ce7a32feeaae3e13d0a64102ffa590d9/cpp/src/arrow/table.cc#L309:

Status Table::FromRecordBatches(const std::vector<std::shared_ptr<RecordBatch>>& batches,
                                std::shared_ptr<Table>* table) {
  if (batches.size() == 0) {
    return Status::Invalid("Must pass at least one record batch");
  }
  ...

Attachments

Issue Links

links to

GitHub Pull Request #3871

Activity

People

Assignee:: Wes McKinney

Reporter:: Jingyuan Wang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/Feb/18 23:18

Updated:: 11/Jan/23 07:19

Resolved:: 23/May/19 15:46

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 50m