Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-4629

[Python] Pandas to arrow conversion slowed down by local imports

    XMLWordPrintableJSON

Details

    Description

      The pandas to arrow conversion is currently slowed down significantly by various local import statements.

      import pandas as pd
      import pyarrow as pa
      import cProfile
      ser = pd.Series(range(10000))
      df = pd.DataFrame({col: ser.copy(deep=True) for col in range(50)})
      # Simulate a real dataset, i.e. force copy of data
      df = df.astype({col: str for col in range(25)})
      prof = cProfile.Profile()
      
      prof.enable()
      # a few times to collect statistics
      for _ in range(100):
          pa.Table.from_pandas(df, nthreads=1)
      prof.disable()
      prof.dump_stats("array_conversion.prof")
      

      Attachments

        1. image-2019-02-19-19-10-46-330.png
          73 kB
          Florian Jetter

        Issue Links

          Activity

            People

              fjetter Florian Jetter
              fjetter Florian Jetter
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 10m
                  3h 10m