Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-4629

[Python] Pandas to arrow conversion slowed down by local imports

    XMLWordPrintableJSON

    Details

      Description

      The pandas to arrow conversion is currently slowed down significantly by various local import statements.

      import pandas as pd
      import pyarrow as pa
      import cProfile
      ser = pd.Series(range(10000))
      df = pd.DataFrame({col: ser.copy(deep=True) for col in range(50)})
      # Simulate a real dataset, i.e. force copy of data
      df = df.astype({col: str for col in range(25)})
      prof = cProfile.Profile()
      
      prof.enable()
      # a few times to collect statistics
      for _ in range(100):
          pa.Table.from_pandas(df, nthreads=1)
      prof.disable()
      prof.dump_stats("array_conversion.prof")
      

        Attachments

        1. image-2019-02-19-19-10-46-330.png
          73 kB
          Florian Jetter

          Issue Links

            Activity

              People

              • Assignee:
                fjetter Florian Jetter
                Reporter:
                fjetter Florian Jetter
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 10m
                  3h 10m