Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-326

Multi-table merge join hit IOBE; merge join may over-allocate memory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.4.0
    • None
    • None

    Description

      The following query which joins 4 tables could hit IndexOutOfBoundsException.

      message: "Failure while running fragment. < IndexOutOfBoundsException:[ index: 2205764, length: 4 (expected: range(0, 2205764))

      SELECT S.S_ACCTBAL, S.S_NAME, N.N_NAME
      FROM
      ( SELECT _MAP['P_PARTKEY'] as P_PARTKEY,
      _MAP['P_MFGR'] as P_MFGR
      FROM "/Users/jni//work/tpc-h-parquet/part") P,
      ( SELECT _MAP['S_SUPPKEY'] AS S_SUPPKEY,
      _MAP['S_NATIONKEY'] AS S_NATIONKEY,
      _MAP['S_ACCTBAL'] AS S_ACCTBAL,
      _MAP['S_NAME'] AS S_NAME,
      _MAP['S_ADDRESS'] AS S_ADDRESS,
      _MAP['S_PHONE'] AS S_PHONE,
      _MAP['S_COMMENT'] AS S_COMMENT
      FROM "/Users/jni//work/tpc-h-parquet/supplier") S,
      (SELECT _MAP['PS_PARTKEY'] AS PS_PARTKEY,
      _MAP['PS_SUPPKEY'] AS PS_SUPPKEY
      FROM "/Users/jni//work/tpc-h-parquet/partsupp") PS,
      ( SELECT CAST(_MAP['N_NAME'] AS VARCHAR) AS N_NAME,
      _MAP['N_NATIONKEY'] AS N_NATIONKEY
      FROM "/Users/jni//work/tpc-h-parquet/nation" ) N
      WHERE P.P_PARTKEY = PS.PS_PARTKEY and
      S.S_SUPPKEY = PS.PS_SUPPKEY and
      S.S_NATIONKEY = N.N_NATIONKEY
      LIMIT 100;

      The root cause of this IOBE is that merge join continue to increase the output position, even if the copy from left or right input fails. This would cause the merge join batch size to exceed the buffer capacity, and hence hit IOBE in the downstream batch processing.

      This bug also exposes another two issues.

      1) we need a way to verify that each batch size is within the 65535 limit. This will make it easier to debug similar problem in the future, since if certain code bug causes the batch size goes beyond the limit, we could catch such issue right away, in stead of continue the execution, and hit error in downstream batch processing.

      2) merge join batch may allocate buffer using different row count for value vectors copying from the left and right. In join operation, this should be equal. Using different row count could lead unnecessary memory overhead. Also, the merge join batch size should be bounded by the limit.

      Attachments

        1. DRILL-326.4.patch.txt
          11 kB
          Jinfeng Ni

        Activity

          People

            jni Jinfeng Ni
            jni Jinfeng Ni
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: