Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6688

Data batches for Project operator exceed the maximum specified

    XMLWordPrintableJSON

Details

    Description

      I ran this query:
      alter session set `drill.exec.memory.operator.project.output_batch_size` = 131072;
      alter session set `planner.width.max_per_node` = 1;
      alter session set `planner.width.max_per_query` = 1;
      select
      chr(101) CharacterValuea,
      chr(102) CharacterValueb,
      chr(103) CharacterValuec,
      chr(104) CharacterValued,
      chr(105) CharacterValuee
      from dfs.`/drill/testdata/batch_memory/character5_1MB.parquet`;

      The output has 1024 identical lines:
      e f g h i

      There is one incoming batch:
      2018-08-09 15:50:14,794 [24933ad8-a5e2-73f1-90dd-947fc2938e54:frag:0:0] DEBUG o.a.d.e.p.i.p.ProjectMemoryManager - BATCH_STATS, incoming: Batch size:

      { Records: 60000, Total size: 0, Data size: 300000, Gross row width: 0, Net row width: 5, Density: 0% }

      Batch schema & sizes:

      { `_DEFAULT_COL_TO_READ_`(type: OPTIONAL INT, count: 60000, Per entry: std data size: 4, std net size: 5, actual data size: 4, actual net size: 5 Totals: data size: 240000, net size: 300000) }

      }

      There are four outgoing batches. All are too large. The first three look like this:
      2018-08-09 15:50:14,799 [24933ad8-a5e2-73f1-90dd-947fc2938e54:frag:0:0] DEBUG o.a.d.e.p.i.p.ProjectRecordBatch - BATCH_STATS, outgoing: Batch size:

      { Records: 16383, Total size: 0, Data size: 409575, Gross row width: 0, Net row width: 25, Density: 0% }

      Batch schema & sizes:

      { CharacterValuea(type: REQUIRED VARCHAR, count: 16383, Per entry: std data size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: data size: 16383, net size: 81915) }

      CharacterValueb(type: REQUIRED VARCHAR, count: 16383, Per entry: std data size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: data size: 16383, net size: 81915) }
      CharacterValuec(type: REQUIRED VARCHAR, count: 16383, Per entry: std data size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: data size: 16383, net size: 81915) }
      CharacterValued(type: REQUIRED VARCHAR, count: 16383, Per entry: std data size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: data size: 16383, net size: 81915) }
      CharacterValuee(type: REQUIRED VARCHAR, count: 16383, Per entry: std data size: 50, std net size: 54, actual data size: 1, actual net size: 5 Totals: data size: 16383, net size: 81915) }
      }

      The last batch is smaller because it has the remaining records.

      The data size (409575) exceeds the maximum batch size (131072).

      character415.q

      Attachments

        Issue Links

          Activity

            People

              karthikm Karthikeyan Manivannan
              rhou Robert Hou
              Boaz Ben-Zvi Boaz Ben-Zvi
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: