Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4960

Split followed by order by/skewed join is skewed in Tez

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.17.0, 0.16.1
    • None
    • None

    Description

      Sampling is not done right. Split is a special case as EOP is returned after each record is processed. We did fixes for that before (PIG-4480, etc), but still it is not done right.

      In case of skewed join, skipInterval is applied for each record instead of all the records. So except for the first record all the other records are mostly skipped. Sampling is slightly better than worse if there is a FLATTEN of bag on the input record to Split as there are multiple records to process.

      In case of order by, samples were being returned even as they were being updated with new data. So samples mostly contained records from the first few hundreds of rows.

      Attachments

        1. PIG-4960-1.patch
          3 kB
          Rohini Palaniswamy

        Activity

          People

            rohini Rohini Palaniswamy
            rohini Rohini Palaniswamy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: