Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7214

[Python] unpickling a pyarrow table with dictionary fields crashes

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.14.0, 0.14.1, 0.15.0, 0.15.1
    • 0.16.0
    • Python

    Description

      The following code crashes on this check:

      F1120 07:51:37.523720 12432 array.cc:773]  Check failed: (data->dictionary) != (nullptr) 
      

       

      import cPickle as pickle
      import pandas as pd
      import pyarrow as pa
      
      df = pd.DataFrame([{"cat": "a", "val":1},{"cat": "b", "val":2} ])
      df["cat"] = df["cat"].astype('category')index_table = pa.Table.from_pandas(df, preserve_index=False)
      
      with open('/tmp/zz.pickle', 'wb') as f:
          pickle.dump(index_table, f, protocol=2)
      
      with open('/tmp/zz.pickle', 'rb') as f:
         index_table = pickle.load(f)
      

       

      Used Python2 with the following environment:

      Package         Version
      --------------- -------
      enum34          1.1.6  
      futures         3.3.0  
      numpy           1.16.5 
      pandas          0.24.2 
      pip             19.3.1 
      pyarrow         0.14.1 (0.14.0 and up suffer from this issue)
      python-dateutil 2.8.1  
      pytz            2019.3 
      setuptools      41.6.0 
      six             1.13.0 
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jorisvandenbossche Joris Van den Bossche
            selitvin Yevgeni Litvin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 0.5h
                0.5h

                Issue deployment