Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2514

[Python] Inferring / converting nested Numpy array is very slow

    XMLWordPrintableJSON

    Details

      Description

      Converting a nested Numpy array nested walks over the Numpy data as Python objects, even if the dtype is not "object". This makes it pointlessly slow compared to the non-nested case, and even the nested Python list case:

      >>> %%timeit data = list(range(10000))
      ...:pa.array(data)
      ...:
      746 µs ± 8.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
      >>> %%timeit data = np.arange(10000)
      ...:pa.array(data)
      ...:
      81.1 µs ± 57.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
      >>> %%timeit data = [np.arange(10000)]
      ...:pa.array(data)
      ...:
      3.39 ms ± 6.27 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                apitrou Antoine Pitrou
                Reporter:
                apitrou Antoine Pitrou
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m