Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6375

[C++] Extend ConversionTraits to allow efficiently appending list values in STL API

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.15.0
    • C++

    Description

      I was trying to benchmark performances of using array builders vs. STL API for converting some row data to arrow tables. I realized it is around 1.5-1.8 times slower to convert std::vector values with STL API than doing so with builder API. It appears this is primarily due to appending rows via ...::Append method by iterating over ConversionTrait<std::vector<...>>::AppendRow for each value.

      Calling ...::AppendValues would make it more efficient, however, ConversionTraits doesn't offer a way for appending more than one cells (AppendRow takes a builder and a single cell as its parameters).

      Would it be possible to extend conversion traits with an optional method AppendRows(Builder, Cell*, size_t), which allows template specialization to efficiently append multiple cells at once? In the example above this function would be called with std::vector::data() and std::vector::size() if provided. If such method isn't provided by the specialization, current behavior (i.e. iterating over AppendRow) can be used as default.

      This is the particular part in code that will be replaced in practice. Instead of directly calling AppendRow in a for loop, a public helper function (e.g. stl::AppendRows) can be provided, in which it implements above logic.

      Attachments

        Issue Links

          Activity

            People

              ozars Omer Ozarslan
              ozars Omer Ozarslan
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h