Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9983

[C++][Dataset][Python] Use larger default batch size than 32K for Datasets API

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0
    • C++

    Description

      Dremio uses 64K batch sizes. We could probably get away with even larger batch sizes (e.g. 256K or 1M) and allow memory-constrained users to elect a smaller batch size.

      See example of some performance issues related to this in ARROW-9924

      Attachments

        Issue Links

          Activity

            People

              bkietz Ben Kietzman
              wesm Wes McKinney
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: