Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-5326

Issue with auto parallelism and scalar inputs in Tez

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: tez
    • Labels:
      None

      Description

      I'm getting a "Scalar has more than one row in the output" error with the following script:

      a = LOAD 't' as (x:chararray);
      b = GROUP a BY x PARALLEL 2;
      c = GROUP b by group;
      d = FOREACH (GROUP a ALL) GENERATE COUNT(a) as count;
      e = FOREACH c GENERATE group, d.count;
      DUMP e;
      

      If I add a PARALLEL clause to c, the error goes away, so the issue seems to be related to auto parallelism.

      I'm not very familiar with Tez, so I'm not sure how things are supposed to work, the issue seems to be related to the following (I know almost nothing about Tez so take this with a grain of salt):

      1. PigGraceShuffleVertexManager calls VertexImpl.reconfigureVertex(), which configures the parallelism of the vertex (VertexImpl.numTasks)
      2. The InputSpec for the scalar input is created (via Edge.getDestinationSpec()) with physicalInputCount equal to the parallelism set above
      3. The input is created (in LogicalIOProcessorRuntimeTask.createInput()) based on this InputSpec.
      4. The resulting UnorderedKVInput creates a ShuffleManager with numInputs = numPhysicalInputs.

      This creates a reader that reads the scalar input numPhysicalInputs times, which results in the "Scalar has more than one row in the output" error in ReadScalarsTez.

      When parallelism is specified explicitly, VertexImpl.reconfigureVertex() is never called, and numPhysicalInputs remains as 1 for the scalar input.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tmwoodruff Travis Woodruff
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: