Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2400

[C++] Status destructor is expensive

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.9.0
    • 0.10.0
    • C++

    Description

      Let's take the following micro-benchmark (in Python):

      $ python -m timeit -s "import pyarrow as pa; data = [b'xx' for i in range(10000)]" "pa.array(data, type=pa.binary())"
      1000 loops, best of 3: 784 usec per loop
      

      If I replace the Status destructor with a no-op:

        ~Status() { }
      

      then the benchmark result becomes:

      $ python -m timeit -s "import pyarrow as pa; data = [b'xx' for i in range(10000)]" "pa.array(data, type=pa.binary())"
      1000 loops, best of 3: 561 usec per loop
      

      This is almost a 30% win. I get similar results on the conversion benchmarks in the benchmark suite.

      I'm unsure about the explanation. In the common case, delete _state should be extremely fast, since the state is NULL. Yet, it seems it adds significant overhead. Perhaps because of exception handling?

      Attachments

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              apitrou Antoine Pitrou
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m