Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13301

BaseListBuilder constructor should check the provided type is a list

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Trivial
    • Resolution: Unresolved
    • 4.0.1
    • None
    • C++
    • None

    Description

      I've noticed that I can create a ListBuilder with a type that is not a ListType (in particular a StructType).

      I'm talking about the following constructor:

      BaseListBuilder(MemoryPool* pool, std::shared_ptr<ArrayBuilder> const& value_builder,
                        const std::shared_ptr<DataType>& type)
       

      I think this constructor should enforce that the given type is a ListType.
      It could also possibly enforce that the type of the elements of the given ListType match the element of the value_build.
      Alternatively that constructor could be made private (since `BaseListBuilder(MemoryPool* pool, std::shared_ptr<ArrayBuilder> const& value_builder)` should be enough for most use case).

      Here's an example where I'm trying to create a "ListType(list<item: struct<return_code: int32, message: string>>)".

      When I create the ListBuilder I've noticed that I works with type set to:

      1. ListType(list<item: struct<return_code: int32, message: string>>)
      2. StructType(struct<return_code: int32, message: string>)

      In the first case the underlying type is: ListType(list<item: struct<return_code: int32, message: string>>)

      But in the second type the underlying type is ListType(list<return_code: struct<return_code: int32, message: string>>). The subtle difference is that the ListType field name has been changed from item to the name of the first element of the list (return_code).

      I think it's because BaseListBuilder uses `type->field(0)` to get the name of the list field, but it uses `value_builder_->type()` to get the type.

      See:

        BaseListBuilder(MemoryPool* pool, std::shared_ptr<ArrayBuilder> const& value_builder,
                        const std::shared_ptr<DataType>& type)
            : ArrayBuilder(pool),
              offsets_builder_(pool),
              value_builder_(value_builder),
              value_field_(type->field(0)->WithType(NULLPTR)) {}
      // ...
      std::shared_ptr<DataType> type() const override {
          return std::make_shared<TYPE>(value_field_->WithType(value_builder_->type()));
        }
      
      

      Here's an example that reproduce the issue:

      
      
      BOOST_AUTO_TEST_CASE(IsThereABugWithArrays) {
        const arrow::FieldVector fields = {
            arrow::field("return_code", arrow::int32()),
            arrow::field("message", arrow::utf8())};
      
        const std::shared_ptr<arrow::DataType> struct_data_type =
            arrow::struct_(fields);
        const std::shared_ptr<arrow::DataType> list_of_struct_data_type =
            arrow::list(struct_data_type);
      
        const std::shared_ptr<arrow::Schema> schema =
            arrow::schema({arrow::field("search_results", list_of_struct_data_type)});
      
        arrow::MemoryPool *pool = arrow::default_memory_pool();
      
        std::shared_ptr<arrow::Int32Builder> return_code_builder =
            std::make_shared<arrow::Int32Builder>(pool);
        std::shared_ptr<arrow::StringBuilder> message_builder =
            std::make_shared<arrow::StringBuilder>(pool);
        std::vector<std::shared_ptr<arrow::ArrayBuilder>> struct_fields_builders = {
            return_code_builder, message_builder};
      
      
        std::shared_ptr<arrow::StructBuilder> struct_builder =
            std::make_shared<arrow::StructBuilder>(
                struct_data_type, pool, struct_fields_builders);
        std::shared_ptr<arrow::ListBuilder> list_builder(
            std::make_shared<arrow::ListBuilder>(
                pool, struct_builder, list_of_struct_data_type));
      
        BOOST_REQUIRE(list_builder->type()->Equals(list_of_struct_data_type));
      
        // This should not be allowed:
        std::shared_ptr<arrow::ListBuilder> list_builder_using_struct_dtype(
            std::make_shared<arrow::ListBuilder>(
                pool, struct_builder, struct_data_type));
      
        std::shared_ptr<arrow::DataType> wrong_data_type = std::make_shared<arrow::ListType> (
            arrow::field("return_code", struct_data_type)
            );
      
        BOOST_REQUIRE(!list_builder_using_struct_dtype->type()->Equals(list_of_struct_data_type));
        BOOST_REQUIRE(list_builder_using_struct_dtype->type()->Equals(wrong_data_type));
      
      }
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            0x26dres &res
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: