Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8888

[Python] Heuristic in dataframe_to_arrays that decides to multithread convert cause slow conversions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.16.0
    • 1.0.0
    • Python
    • MacOS: 10.15.4 (Also happening on windows 10)
      Python: 3.7.3
      Pyarrow: 0.16.0
      Pandas: 0.25.3

    Description

      When calling pa.Table.from_pandas() the code path that uses the ThreadPoolExecutor in dataframe_to_arrays (called by Table.from_pandas) the conversion is much much slower.

       
      I have a simple example - but the time difference is much worse with a real table.

       

      Python 3.7.3 | packaged by conda-forge | (default, Dec 6 2019, 08:54:18)
       Type 'copyright', 'credits' or 'license' for more information
       IPython 7.13.0 – An enhanced Interactive Python. Type '?' for help.
      In [1]: import pyarrow as pa
      In [2]: import pandas as pd
      In [3]: df = pd.DataFrame({"A": [0] * 10000000})
      In [4]: %timeit table = pa.Table.from_pandas(df)
       577 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
      In [5]: %timeit table = pa.Table.from_pandas(df, nthreads=1)
       106 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
      

       

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              kevinglasson Kevin Glasson
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m