Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7112

Wrong contents when initializinga pyarrow.Table from boolean DataFrame

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 0.14.1
    • Fix Version/s: 0.15.0
    • Component/s: Python
    • Labels:
      None
    • Environment:
      Tested with 0.14.1 and 0.14.0.RAY from pip3 on ubuntu

      Description

      When initializing a Table from a boolean pandas.DataFrame that is not in Fortran order, the contents of the resulting Table is different from the contents of the DataFrame.

      Sample:

       

      import pandas as pd
      import pyarrow as pa
      import numpy as np
      mask = np.full((3,3), False)
      mask[:,1] = True
      df = pd.DataFrame(mask)
      print(df)
      print(pa.table(df).to_pandas()) 
      

       

      The output:

       

             0     1      2
      0  False  True  False
      1  False  True  False
      2  False  True  False
             0      1      2
      0  False   True  False
      1  False  False  False
      2  False  False  False
      

      I.e., column 1 is different before and after roundtripping through pa.Table.

      If I add order='F' to the np.full invocation, the result is as expected. Also, the problem seems to disappear if I use dtype=int.

       

       

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jobh Joachim Haga

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment