Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6688

Data batches for Project operator exceed the maximum specified

    XMLWordPrintableJSON

    Details

      Description

      I ran this query:
      alter session set `drill.exec.memory.operator.project.output_batch_size` = 131072;
      alter session set `planner.width.max_per_node` = 1;
      alter session set `planner.width.max_per_query` = 1;
      select
      chr(101) CharacterValuea,
      chr(102) CharacterValueb,
      chr(103) CharacterValuec,
      chr(104) CharacterValued,
      chr(105) CharacterValuee
      from dfs.`/drill/testdata/batch_memory/character5_1MB.parquet`;

      The output has 1024 identical lines:
      e f g h i

      There is one incoming batch:
      2018-08-09 15:50:14,794 [24933ad8-a5e2-73f1-90dd-947fc2938e54:frag:0:0] DEBUG o.a.d.e.p.i.p.ProjectMemoryManager - BATCH_STATS, incoming: Batch size:

      { Records: 60000, Total size: 0, Data size: 300000, Gross row width: 0, Net row width: 5, Density: 0% }

      Batch schema & sizes:

      { `_DEFAULT_COL_TO_READ_`(type: OPTIONAL INT, count: 60000, Per entry: std data size: 4, std net size: 5, actual data size: 4, actual net size: 5 Totals: data size: 240000, net size: 300000) }

      }

      There are four outgoing batches. All are too large. The first three look like this:
      2018-08-09 15:50:14,799 [24933ad8-a5e2-73f1-90dd-947fc2938e54:frag:0:0] DEBUG o.a.d.e.p.i.p.ProjectRecordBatch - BATCH_STATS, outgoing: Batch size:

      { Records: 16383, Total size: 0, Data size: 409575, Gross row width: 0, Net row width: 25, Density: 0% }

      Batch schema & sizes:

      { CharacterValuea(type: REQUIRED VARCHAR, count: 16383, Per entry: std data size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: data size: 16383, net size: 81915) }

      CharacterValueb(type: REQUIRED VARCHAR, count: 16383, Per entry: std data size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: data size: 16383, net size: 81915) }
      CharacterValuec(type: REQUIRED VARCHAR, count: 16383, Per entry: std data size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: data size: 16383, net size: 81915) }
      CharacterValued(type: REQUIRED VARCHAR, count: 16383, Per entry: std data size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: data size: 16383, net size: 81915) }
      CharacterValuee(type: REQUIRED VARCHAR, count: 16383, Per entry: std data size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: data size: 16383, net size: 81915) }
      }

      The last batch is smaller because it has the remaining records.

      The data size (409575) exceeds the maximum batch size (131072).

      character415.q

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                karthikm Karthikeyan Manivannan
                Reporter:
                rhou Robert Hou
                Reviewer:
                Boaz Ben-Zvi
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: