Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25831

Report Progress on Every Record Read for CompactorMR

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Progress should be updated for every read of an input

       

      reads an input, writes an output, nor updates its status string

      https://github.com/apache/hive/blob/fffb31f2346df2b8011a9949895de21f506c0117/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java#L813-L828

      I think ever loop should simply be calling progress(). If during a major compaction there are a lot of deleted values, long gaps of time can occur without a progress update and the job may be timed out by YARN.

      I'm not 100% sure this is happening, but just something I wanted to point out.

      Attachments

        Activity

          People

            Unassigned Unassigned
            belugabehr David Mollitor
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: