Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6158

[Python] possible to create StructArray with type that conflicts with child array's types

    XMLWordPrintableJSON

Details

    Description

      Using the Python interface as example. This creates a StructArray where the field types don't match the child array types:

      a = pa.array([1, 2, 3], type=pa.int64())
      b = pa.array(['a', 'b', 'c'], type=pa.string())
      inconsistent_fields = [pa.field('a', pa.int32()), pa.field('b', pa.float64())]
      
      a = pa.StructArray.from_arrays([a, b], fields=inconsistent_fields) 
      

      The above works fine. I didn't find anything that errors (eg conversion to pandas, slicing), also validation passes, but the type actually has the inconsistent child types:

      In [2]: a
      Out[2]: 
      <pyarrow.lib.StructArray object at 0x7f450af52eb8>
      -- is_valid: all not null
      -- child 0 type: int64
        [
          1,
          2,
          3
        ]
      -- child 1 type: string
        [
          "a",
          "b",
          "c"
        ]
      
      In [3]: a.type
      Out[3]: StructType(struct<a: int32, b: double>)
      
      In [4]: a.to_pandas()
      Out[4]: 
      array([{'a': 1, 'b': 'a'}, {'a': 2, 'b': 'b'}, {'a': 3, 'b': 'c'}],
            dtype=object)
      
      In [5]: a.validate() 
      

      Shouldn't this be disallowed somehow? (it could be checked in the Python from_arrays method, but maybe also in StructArray::Make which already checks for the number of fields vs arrays and a consistent array length).

      Similarly to discussion in ARROW-6132, I would also expect that this the ValidateArray catches this.

      Attachments

        Issue Links

          Activity

            People

              jorisvandenbossche Joris Van den Bossche
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h