Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6417

[C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x

    XMLWordPrintableJSON

Details

    Description

      In doing some benchmarking, I have found that binary reads seem to be slower from Arrow 0.11.1 to master branch. It would be a good idea to do some basic profiling to see where we might improve our memory allocation strategy (or whatever the bottleneck turns out to be)

      Attachments

        1. 20190903_parquet_benchmark.py
          4 kB
          Wes McKinney
        2. 20190903_parquet_read_perf.png
          12 kB
          Wes McKinney

        Activity

          People

            Unassigned Unassigned
            wesm Wes McKinney
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1.5h
                1.5h