Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2119

[C++][Java] Handle Arrow stream with zero record batch

    XMLWordPrintableJSON

    Details

      Description

      It looks like currently many places of the code assume that there needs to be at least one record batch for streaming format. Is zero-recordbatch not supported by design?

      e.g. https://github.com/apache/arrow/blob/master/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java#L45

        public static void convert(InputStream in, OutputStream out) throws IOException {
          BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
          try (ArrowStreamReader reader = new ArrowStreamReader(in, allocator)) {
            VectorSchemaRoot root = reader.getVectorSchemaRoot();
            // load the first batch before instantiating the writer so that we have any dictionaries
            if (!reader.loadNextBatch()) {
              throw new IOException("Unable to read first record batch");
            }
            ...
      

      Pyarrow-0.8.0 does not load 0-recordbatch stream either. It would throw an exception originated from https://github.com/apache/arrow/blob/a95465b8ce7a32feeaae3e13d0a64102ffa590d9/cpp/src/arrow/table.cc#L309:

      Status Table::FromRecordBatches(const std::vector<std::shared_ptr<RecordBatch>>& batches,
                                      std::shared_ptr<Table>* table) {
        if (batches.size() == 0) {
          return Status::Invalid("Must pass at least one record batch");
        }
        ...

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                wesmckinn Wes McKinney
                Reporter:
                alphalfalfa Jingyuan Wang
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m