Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-410

Heartbeating for streaming jobs should not depend on stdout

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      jobs that require iterative processing may take longer than 10 mins to produce rows. This shouldn't be cause to kill the job. Producing keepalive dummy rows to stdout is bad if the data has to go into a Hive table or other Hive steps.

      If we adopt the solution of using stderr to indicate heartbeats, can that be combined with streaming counters (http://hadoop.apache.org/core/docs/current/streaming.html#How+do+I+update+counters+in+streaming+applications%3F )? Also, will limitations on size of stderr break this?

        Attachments

        1. patch-410-2.txt
          3 kB
          Ashish Thusoo
        2. patch-410.txt
          3 kB
          Ashish Thusoo

          Issue Links

            Activity

              People

              • Assignee:
                athusoo Ashish Thusoo
                Reporter:
                indigoviolet Venky Iyer
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: