Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6417

[C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x

    XMLWordPrintableJSON

    Details

      Description

      In doing some benchmarking, I have found that binary reads seem to be slower from Arrow 0.11.1 to master branch. It would be a good idea to do some basic profiling to see where we might improve our memory allocation strategy (or whatever the bottleneck turns out to be)

        Attachments

        1. 20190903_parquet_benchmark.py
          4 kB
          Wes McKinney
        2. 20190903_parquet_read_perf.png
          12 kB
          Wes McKinney

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              wesm Wes McKinney
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1.5h
                1.5h