Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1377

[Python] Add function to assist with benchmarking Parquet scan performance

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.7.0
    • Python
    • None

    Description

      It would be simpler to assess the raw performance of parquet-cpp Parquet scans to have something akin to https://github.com/apache/parquet-cpp/blob/master/tools/parquet-scan.cc available as a callable Python function. This way we can isolate the performance of scanning the file from converting it to Arrow (and to pandas)

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              wesm Wes McKinney
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: