Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7214

[Python] unpickling a pyarrow table with dictionary fields crashes

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.14.0, 0.14.1, 0.15.0, 0.15.1
    • Fix Version/s: 0.16.0
    • Component/s: Python

      Description

      The following code crashes on this check:

      F1120 07:51:37.523720 12432 array.cc:773]  Check failed: (data->dictionary) != (nullptr) 
      

       

      import cPickle as pickle
      import pandas as pd
      import pyarrow as pa
      
      df = pd.DataFrame([{"cat": "a", "val":1},{"cat": "b", "val":2} ])
      df["cat"] = df["cat"].astype('category')index_table = pa.Table.from_pandas(df, preserve_index=False)
      
      with open('/tmp/zz.pickle', 'wb') as f:
          pickle.dump(index_table, f, protocol=2)
      
      with open('/tmp/zz.pickle', 'rb') as f:
         index_table = pickle.load(f)
      

       

      Used Python2 with the following environment:

      Package         Version
      --------------- -------
      enum34          1.1.6  
      futures         3.3.0  
      numpy           1.16.5 
      pandas          0.24.2 
      pip             19.3.1 
      pyarrow         0.14.1 (0.14.0 and up suffer from this issue)
      python-dateutil 2.8.1  
      pytz            2019.3 
      setuptools      41.6.0 
      six             1.13.0 
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jorisvandenbossche Joris Van den Bossche
                Reporter:
                selitvin Yevgeni Litvin
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h