Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2814

[Python] Unify PyObject* sequence conversion paths for built-in sequences, NumPy arrays

    XMLWordPrintableJSON

Details

    Description

      Original issue title: "Struct type inference and conversion works for lists but not NumPy arrays with dtype object"

      Example, setup:

      import pandas as pd
      
      s = pd.Series([{'data': {'document_id': None,
        'document_type': None,
        'master_customer_id': None,
        'message': 'User Login Request',
        'policy_id': None,
        'sequence_no': 14,
        'user_name': None},
       'header': {'actor_id': None,
        'actor_type': None,
        'brand_code': 'ES',
        'event_origin': None,
        'event_timestamp': '2018-01-01T18:25:43.511Z',
        'event_type': 'LOGIN',
        'master_customer_id': '14',
        'source': 'CUSTOMER_AUTH_SERVICE',
        'source_id': None,
        'source_version': None},
       'payload_version': '1',
       'status': {'status_code': 100, 'status_message': 'Success'}}])
      

      This works:

      In [24]: pa.array(list(s))
      Out[24]: 
      <pyarrow.lib.StructArray object at 0x7f8435b09c28>
      [
        {'data': {'document_id': None, 'document_type': None, 'master_customer_id': None, 'message': 'User Login Request', 'policy_id': None, 'sequence_no': 14, 'user_name': None}, 'header': {'actor_id': None, 'actor_type': None, 'brand_code': 'ES', 'event_origin': None, 'event_timestamp': '2018-01-01T18:25:43.511Z', 'event_type': 'LOGIN', 'master_customer_id': '14', 'source': 'CUSTOMER_AUTH_SERVICE', 'source_id': None, 'source_version': None}, 'payload_version': '1', 'status': {'status_code': 100, 'status_message': 'Success'}}
      ]
      

      This does not:

      In [23]: pa.array(s)
      ---------------------------------------------------------------------------
      ArrowInvalid                              Traceback (most recent call last)
      <ipython-input-23-eba23a1638b7> in <module>()
      ----> 1 pa.array(s)
      
      ~/code/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()
          175             values, type = pdcompat.get_datetimetz_type(values, obj.dtype,
          176                                                         type)
      --> 177             return _ndarray_to_array(values, mask, type, from_pandas, pool)
          178     else:
          179         if mask is not None:
      
      ~/code/arrow/python/pyarrow/array.pxi in pyarrow.lib._ndarray_to_array()
           75 
           76     with nogil:
      ---> 77         check_status(NdarrayToArrow(pool, values, mask,
           78                                     use_pandas_null_sentinels,
           79                                     c_type, &chunked_out))
      
      ~/code/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
           79         message = frombytes(status.message())
           80         if status.IsInvalid():
      ---> 81             raise ArrowInvalid(message)
           82         elif status.IsIOError():
           83             raise ArrowIOError(message)
      
      ArrowInvalid: ../src/arrow/python/numpy_to_arrow.cc:1742 code: converter.Convert()
      Error inferring Arrow type for Python object array. Got Python object of type dict but can only handle these types: string, bool, float, int, date, time, decimal, bytearray, list, array
      

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              rob-dempsey-esure rob
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 9h 20m
                  9h 20m