Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6786

[C++] arrow-dataset-file-parquet-test is slow

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.16.0
    • C++
    • None

    Description

      It takes 15 seconds in debug mode (probably more with ASAN / UBSAN /etc.) to run 2 tests that simply iterated through a generated in-memory dataset:

      $ ./build-test/debug/arrow-dataset-file-parquet-test 
      Running main() from /home/conda/feedstock_root/build_artifacts/gtest_1551008230529/work/googletest/src/gtest_main.cc
      [==========] Running 2 tests from 1 test case.
      [----------] Global test environment set-up.
      [----------] 2 tests from TestParquetFileFormat
      [ RUN      ] TestParquetFileFormat.ScanRecordBatchReader
      [       OK ] TestParquetFileFormat.ScanRecordBatchReader (7338 ms)
      [ RUN      ] TestParquetFileFormat.Inspect
      [       OK ] TestParquetFileFormat.Inspect (6222 ms)
      [----------] 2 tests from TestParquetFileFormat (13560 ms total)
      
      [----------] Global test environment tear-down
      [==========] 2 tests from 1 test case ran. (13560 ms total)
      [  PASSED  ] 2 tests.
      

      Unless it is stressing something in particular, the number of repetitions or the batch size can probably be reduced dramatically.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              apitrou Antoine Pitrou
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: