Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17115

[C++] HashJoin fails if it encounters a batch with more than 32Ki rows

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 9.0.0
    • C++

    Description

      The new swiss join assumes that batches are being broken according to the morsel/batch model and it assumes those batches have, at most, 32Ki rows (signed 16-bit indices are used in various places).

      However, we are not currently slicing all of our inputs to batches this small. This is causing conbench to fail and would likely be a problem with any large inputs.

      We should fix this by slicing batches in the engine to the appropriate maximum size.

      Attachments

        Issue Links

          Activity

            People

              westonpace Weston Pace
              westonpace Weston Pace
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 40m
                  3h 40m