Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-12664

Bug in reduce deduplication optimization causing ArrayOutOfBoundException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.1.1, 1.2.1
    • 2.0.0
    • Hive
    • None

    Description

      The optimisation check for reduce deduplication only checks the first child node for join and the check itself also contains a major bug causing ArrayOutOfBoundException no matter what.

      Sample data table form:

      time user host path referer code agent size method
      int string string string string bigint string bigint string

      Sample query

      SELECT 
        t1.host,
        COUNT(DISTINCT t1.`date`) AS login_count,
        MAX(t2.code) AS code,
        unix_timestamp() AS time
      FROM (
          SELECT 
            HOST,
            MIN(time) AS DATE
          FROM
            www_access
          WHERE
            HOST IS NOT NULL
          GROUP BY
            HOST
        ) t1
      JOIN (
          SELECT 
            HOST,
            MIN(time) AS code
          FROM
            www_access
          WHERE
            HOST IS NOT NULL
          GROUP BY
            HOST
        ) t2
        ON t1.host = t2.host
      GROUP BY
        t1.host
      

      Attachments

        1. HIVE-12664-2.patch
          3 kB
          Johan Gustavsson
        2. HIVE-12664-1.patch
          3 kB
          Johan Gustavsson
        3. HIVE-12664.patch
          3 kB
          Johan Gustavsson
        4. HIVE-12664.2.patch
          3 kB
          Johan Gustavsson
        5. HIVE-12664.1.patch
          3 kB
          Johan Gustavsson

        Activity

          People

            johang Johan Gustavsson
            johang Johan Gustavsson
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: