Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7559

[Rust] Possibly incorrect index check assertion in StringArray and BinaryArray

    XMLWordPrintableJSON

Details

    Description

      The following code tries to build a list array based on an underlying string array and panics on master (commit acfcdee75acb4b1814f2e727c150a7403d618e8f)

       #[test]
      fn nested_string_array() {
          let strarray = StringArray::from(vec!["foo", "bar", "foobar"]);
      
          let nestedData = ArrayData::builder(DataType::List(Box::new(DataType::Utf8)))
              .len(2)
              .add_buffer(Buffer::from(&[0, 2, 3].to_byte_slice()))
              .add_child_data(ArrayData::builder(DataType::Utf8)
                  .len(strarray.len())
                  .add_buffer(strarray.value_offsets())
                  .add_buffer(strarray.value_data())
                  .build())
              .build();
          let nestedArray = ListArray::from(nestedData);
      
          dbg!(nestedArray);
      }

      My guess is that the index check in StringArray.value is incorrect, instead of

          pub fn value(&self, i: usize) -> &str {
              assert!(
                  i + self.offset() < self.data.len(),
                  "StringArray out of bounds access"
              );
      

      it should probably compare i without adding the offset. The same check is also done in BinaryArray. Changing this results in the expected output of

      [arrow/src/array/array.rs:2460] nestedArray = ListArray
      [
        StringArray
      [
        "foo",
        "bar",
      ],
        StringArray
      [
        "foobar",
      ],
      ]
       

      Attachments

        Issue Links

          Activity

            People

              paddyhoran Paddy Horan
              jhorstmann Jörn Horstmann
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h