Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6786

[C++] arrow-dataset-file-parquet-test is slow

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 1.0.0
    • Component/s: C++
    • Labels:
      None

      Description

      It takes 15 seconds in debug mode (probably more with ASAN / UBSAN /etc.) to run 2 tests that simply iterated through a generated in-memory dataset:

      $ ./build-test/debug/arrow-dataset-file-parquet-test 
      Running main() from /home/conda/feedstock_root/build_artifacts/gtest_1551008230529/work/googletest/src/gtest_main.cc
      [==========] Running 2 tests from 1 test case.
      [----------] Global test environment set-up.
      [----------] 2 tests from TestParquetFileFormat
      [ RUN      ] TestParquetFileFormat.ScanRecordBatchReader
      [       OK ] TestParquetFileFormat.ScanRecordBatchReader (7338 ms)
      [ RUN      ] TestParquetFileFormat.Inspect
      [       OK ] TestParquetFileFormat.Inspect (6222 ms)
      [----------] 2 tests from TestParquetFileFormat (13560 ms total)
      
      [----------] Global test environment tear-down
      [==========] 2 tests from 1 test case ran. (13560 ms total)
      [  PASSED  ] 2 tests.
      

      Unless it is stressing something in particular, the number of repetitions or the batch size can probably be reduced dramatically.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              apitrou Antoine Pitrou
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: