Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-3890

[Python] Creating Array with explicit string type fails on Python 2.7

    XMLWordPrintableJSON

Details

    Description

      Pyarrow arrays of string cannot be created from numpy arrays of string anymore for versions pyarrow>=0.8.0 (this includes pyarrow==0.11.1).

      Please find below a quick repro:

      import numpy as np
      import pyarrow as pa
      vec = np.array(["toto", "tata"])
      pa.array(vec, pa.string())
      

      Runing this I get the following:

      ---------------------------------------------------------------------------
      ArrowInvalid                              Traceback (most recent call last)
      <ipython-input-4-e753fb3a8193> in <module>()
      ----> 1 pa.array(vec, pa.string())
      
      /usr/local/lib/python2.7/dist-packages/pyarrow/lib.so in pyarrow.lib.array()
      
      /usr/local/lib/python2.7/dist-packages/pyarrow/lib.so in pyarrow.lib._ndarray_to_array()
      
      /usr/local/lib/python2.7/dist-packages/pyarrow/lib.so in pyarrow.lib.check_status()
      
      ArrowInvalid: 'utf32' codec can't decode bytes in position 0-3: code point not in range(0x110000)
      

      However, this code snippet was working fine with pyarrow==0.7.1.

      Was there any behavior change with string in pyarrow since 0.7.1?
      Do you have any workaround for this?

      Jacques

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              jafournier jacques
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 40m
                  1h 40m