Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6087

Aggregates that use ObjectHolder will fail when Hash Agg spills

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.12.0
    • None
    • None
    • None

    Description

      Drill has this thing called an “ObjectVector” which is vector that holds onto Java objects. We use it for things like the system tables.

      The ObjectVector has something called an ObjectHolder. For various reasons (see this Wiki writeup, some Drill aggregates used this holder to create aggregates that need more than a few numbers as working values.

      As it turns out, all the Decimal AVG functions use the ObjectHolder to hold the intermediate values. (Also true of Decimal Max, Min and Sum. Also true of Max and Min for VarBytes. Just do a code search for uses of ObjectHolder.)

      In the old pre-spill days, things worked fine. But, with Hash Agg spilling, we need to write intermediate values out to disk, then read them back.

      But, the object vector never implemented the methods needed for spilling! Instead, it will throw an UnsupportedOperationException.

      What does this mean?

      If you run a query, using the aggregate functions above, use the Hash Agg, and have enough data to cause spilling, your query will fail. Do the same query with Streaming Agg, and it will work. Reduce data to avoid spilling and the query will work.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Paul.Rogers Paul Rogers
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: