Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5916

[C++] Allow RecordBatch.length to be less than array lengths

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 1.0.0
    • Component/s: C++

      Description

      0.13 ignored RecordBatch.length.  0.14 requires that RecordBatch.length and array length be equal.  As per https://lists.apache.org/thread.html/2692dd8fe09c92aa313bded2f4c2d4240b9ef75a8604ec214eb02571@%3Cdev.arrow.apache.org%3E , we discussed changing this so that RecordBatch.length can be [0,array length].

       If RecordBatch.length is less than array length, the reader should ignore the portion of the array(s) beyond RecordBatch.length.  This will allow partially populated batches to be read in scenarios identified in the above discussion.

        Status GetFieldMetadata(int field_index, ArrayData* out) {
          auto nodes = metadata_->nodes();
          // pop off a field
          if (field_index >= static_cast<int>(nodes->size())) {
            return Status::Invalid("Ran out of field metadata, likely malformed");
          }
          const flatbuf::FieldNode* node = nodes->Get(field_index);
      
      *    //out->length = node->length();*
      *    out->length = metadata_->length();*
          out->null_count = node->null_count();
          out->offset = 0;
          return Status::OK();
        }
      

      Attached is a test IPC File containing a batch with length 1, array length 3.

        Attachments

        1. test.arrow_ipc
          0.5 kB
          John Muehlhausen

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                jgm-ktg John Muehlhausen
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m