[ARROW-9441] [C++] Optimize IPC stream reading - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: C++
Labels:
None

External issue URL:
https://github.com/apache/arrow/issues/25514

Description

Based on perf reports, more time is spent manipulating C++ data structures than reconstructing record batches from IPC messages, which strikes me as not what we want

here is from a perf report based on the Python code

for i in range(100):
    pa.ipc.open_stream('nyctaxi.arrow').read_all()

-   50.40%     0.06%  python           libarrow.so.100.0.0                  [.] arrow::RecordBatchReader::ReadAll
   - 50.34% arrow::RecordBatchReader::ReadAll     
      - 25.86% arrow::Table::FromRecordBatches    
         - 18.41% arrow::SimpleRecordBatch::column
            - 16.00% arrow::MakeArray
               - 10.49% arrow::VisitTypeInline<arrow::internal::ArrayDataWrapper>  
                    7.71% arrow::PrimitiveArray::SetData           
                    1.87% arrow::StringArray::StringArray          
           1.54% __pthread_mutex_lock                              
           0.88% __pthread_mutex_unlock                            
           0.67% std::_Hash_bytes                                  
           0.60% arrow::ChunkedArray::ChunkedArray                 
      - 22.30% arrow::RecordBatchReader::ReadAll                   
         - 22.12% arrow::ipc::RecordBatchStreamReaderImpl::ReadNext
            - 15.91% arrow::ipc::ReadRecordBatchInternal
               - 15.15% arrow::ipc::LoadRecordBatch
                  - 14.45% arrow::ipc::ArrayLoader::Load
                     + 13.15% arrow::VisitTypeInline<arrow::ipc::ArrayLoader>
            + 5.53% arrow::ipc::InputStreamMessageReader::ReadNextMessage 
        1.84% arrow::SimpleRecordBatch::~SimpleRecordBatch

Perhaps ChunkedArray internally should be changed to contain a vector of ArrayData instead of boxed Arrays.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Wes McKinney

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 13/Jul/20 18:21

Updated:: 11/Jan/23 08:06