Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-15820

[C++][Doc] Add table_source to streaming_execution.rst & clarify parameter name

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 8.0.0
    • C++

    Description

      Currently the table_source node does not appear in our documentation.

      Also, in TableSourceNodeOptions we have:

        // Size of batches to emit from this node
        // If the table is larger the node will emit multiple batches from the
        // the table to be processed in parallel.
        int64_t batch_size;
      

      However, when looking into a performance issue today, I realized this description is incomplete. In reality we should probably call this parameter max_batch_size.

      Furthermore, we should make it clear that a table with smaller batches will emit smaller batches directly (this is a good thing in my case) and will not concatenate small batches together into a larger batch.

      Attachments

        Issue Links

          Activity

            People

              vibhatha Vibhatha Lakmal Abeykoon
              westonpace Weston Pace
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h