Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8909

[Java] Out of order writes using setSafe

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.0.0
    • Java

    Description

      I noticed that calling setSafe on a VarCharVector with indices not in increasing order causes the lastIndex to be set to the index in the last call to setSafe.

      Is this a documented and expected behavior ?

      Sample code:

      import java.util.Collections;
      import lombok.extern.slf4j.Slf4j;
      import org.apache.arrow.memory.RootAllocator;
      import org.apache.arrow.vector.VarCharVector;
      import org.apache.arrow.vector.VectorSchemaRoot;
      import org.apache.arrow.vector.types.pojo.ArrowType;
      import org.apache.arrow.vector.types.pojo.Field;
      import org.apache.arrow.vector.types.pojo.Schema;
      import org.apache.arrow.vector.util.Text;
      
      @Slf4j
      public class ATest {
      
        public static void main() {
          Schema schema = new Schema(Collections.singletonList(Field.nullable("Data", new ArrowType.Utf8())));
          try (VectorSchemaRoot vroot = VectorSchemaRoot.create(schema, new RootAllocator())) {
            VarCharVector vec = (VarCharVector) vroot.getVector("Data");
      
            for (int i = 0; i < 10; i++) {
              vec.setSafe(i, new Text(Integer.toString(i) + "_mtest"));
            }
      
            vec.setSafe(7, new Text(Integer.toString(7) + "_new"));
      
            log.info("Data at index 8 Before {}", vec.getObject(8));
            vroot.setRowCount(10);
            log.info("Data at index 8 After {}", vec.getObject(8));
            log.info(vroot.contentToTSVString());
          }
        }
      }
      

       

      If I don't set the index 7 after the loop, I get all the 0_mtest, 1_mtest, ..., 9_mtest entries.

      If I set index 7 after the loop, I see 0_mtest, ..., 5_mtest, 6_mtext, 7_new,
          Before the setRowCount, the data at index 8 is -> st8_mtest  ; index 9 is 9_mtest
         After the setRowCount, the data at index 8 is -> "" ; index  9 is ""

      With a text with more chars instead of 4 with _new, it keeps eating into the data at the following indices.

       

      Attachments

        Issue Links

          Activity

            People

              fan_li_ya Liya Fan
              saurabhm Saurabh
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h