Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12676

RecordBatchBuilder with uint dictionary creates signed int Batch

Add voteWatch issue
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0
    • None
    • C++
    • None

    Description

      When a RecordBatchBuilder with a dictionary type w/ a uint32 index is flushed to a batch, the resulting batch contains a int32 index:

      BatchBuilder schema after flush: 
      Symbol: dictionary<values=string, indices=int16, ordered=0>
      Status: dictionary<values=string, indices=uint32, ordered=0>
      Batch schema after flush:
      Symbol: dictionary<values=string, indices=int16, ordered=0>
      Status: dictionary<values=string, indices=int32, ordered=0>
      

      from:

      std::shared_ptr<arrow::RecordBatch> batch;  
      auto status = batchBuilder_>Flush(&batch);  
      std::cout<<"BatchBuilder schema after flush: "<<batchBuilder_->schema()->ToString()<<std::endl;  
      std::cout<<"Batch schema after flush: "<<batch->schema()->ToString()<<std::endl;  
      
      if(!status.ok()) {    throw Exception("Arrow batch flush failed: {}", status);  }

      This results in a failure to write: "Invalid: Tried to write record batch with different schema"

      I believe this is related to https://issues.apache.org/jira/browse/ARROW-9969 and in particular, this bit: https://github.com/apache/arrow/blob/master/cpp/src/arrow/table_builder.cc#L72

      Is the dictionary->Equals comparison checking the signed-ness of the indices?

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            kdkavanagh Kyle Kavanagh

            Dates

              Created:
              Updated:

              Slack

                Issue deployment