Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7214

[Python] unpickling a pyarrow table with dictionary fields crashes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.14.0, 0.14.1, 0.15.0, 0.15.1
    • 0.16.0
    • Python

    Description

      The following code crashes on this check:

      F1120 07:51:37.523720 12432 array.cc:773]  Check failed: (data->dictionary) != (nullptr) 
      

       

      import cPickle as pickle
      import pandas as pd
      import pyarrow as pa
      
      df = pd.DataFrame([{"cat": "a", "val":1},{"cat": "b", "val":2} ])
      df["cat"] = df["cat"].astype('category')index_table = pa.Table.from_pandas(df, preserve_index=False)
      
      with open('/tmp/zz.pickle', 'wb') as f:
          pickle.dump(index_table, f, protocol=2)
      
      with open('/tmp/zz.pickle', 'rb') as f:
         index_table = pickle.load(f)
      

       

      Used Python2 with the following environment:

      Package         Version
      --------------- -------
      enum34          1.1.6  
      futures         3.3.0  
      numpy           1.16.5 
      pandas          0.24.2 
      pip             19.3.1 
      pyarrow         0.14.1 (0.14.0 and up suffer from this issue)
      python-dateutil 2.8.1  
      pytz            2019.3 
      setuptools      41.6.0 
      six             1.13.0 
      

      Attachments

        Issue Links

          Activity

            People

              jorisvandenbossche Joris Van den Bossche
              selitvin Yevgeni Litvin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h