Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2119

[C++][Java] Handle Arrow stream with zero record batch

    XMLWordPrintableJSON

Details

    Description

      It looks like currently many places of the code assume that there needs to be at least one record batch for streaming format. Is zero-recordbatch not supported by design?

      e.g. https://github.com/apache/arrow/blob/master/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java#L45

        public static void convert(InputStream in, OutputStream out) throws IOException {
          BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
          try (ArrowStreamReader reader = new ArrowStreamReader(in, allocator)) {
            VectorSchemaRoot root = reader.getVectorSchemaRoot();
            // load the first batch before instantiating the writer so that we have any dictionaries
            if (!reader.loadNextBatch()) {
              throw new IOException("Unable to read first record batch");
            }
            ...
      

      Pyarrow-0.8.0 does not load 0-recordbatch stream either. It would throw an exception originated from https://github.com/apache/arrow/blob/a95465b8ce7a32feeaae3e13d0a64102ffa590d9/cpp/src/arrow/table.cc#L309:

      Status Table::FromRecordBatches(const std::vector<std::shared_ptr<RecordBatch>>& batches,
                                      std::shared_ptr<Table>* table) {
        if (batches.size() == 0) {
          return Status::Invalid("Must pass at least one record batch");
        }
        ...

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              alphalfalfa Jingyuan Wang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m