Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-232

Cross-system causal tracing within Hadoop

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Much of Hadoop's behavior is client-driven, with clients responsible for contacting individual datanodes to read and write data, as well as dividing up work for map and reduce tasks. In a large deployment with many concurrent users, identifying the effects of individual clients on the infrastructure is a challenge. The use of data pipelining in HDFS and Map/Reduce make it hard to follow the effects of a given client request through the system.

      This proposal is to instrument the HDFS, IPC, and Map/Reduce layers of Hadoop with X-Trace. X-Trace is an open-source framework for capturing causality of events in a distributed system. It can correlate operations making up a single user request, even if those operations span multiple machines. As an example, you could use X-Trace to follow an HDFS write operation as it is pipelined through intermediate nodes. Additionally, you could trace a single Map/Reduce job and see how it is decomposed into lower-layer HDFS operations.

      Matei Zaharia and Andy Konwinski initially integrated X-Trace with a local copy of the 0.14 release, and I've brought that code up to release 0.17. Performing the integration involves modifying the IPC protocol, inter-datanode protocol, and some data structures in the map/reduce layer to include 20-byte long tracing metadata. With release 0.18, the generated traces could be collected with Chukwa.

      I've attached some example traces of HDFS and IPC layers from the 0.17 patch to this JIRA issue.

      More information about X-Trace is available from http://www.x-trace.net/ as well as in a paper that appeared at NSDI 2007, available online at http://www.usenix.org/events/nsdi07/tech/fonseca.html

        Attachments

        1. multiblockwrite.png
          83 kB
          George Porter
        2. multiblockread.png
          29 kB
          George Porter
        3. HADOOP-4049.patch
          29 kB
          George Porter
        4. HADOOP-4049.7-rpc.patch
          26 kB
          George Porter
        5. HADOOP-4049.6-rpc.patch
          25 kB
          George Porter
        6. HADOOP-4049.4-rpc.patch
          29 kB
          George Porter
        7. HADOOP-4049.3-ipc.patch
          33 kB
          George Porter
        8. HADOOP-4049.2-ipc.patch
          33 kB
          George Porter

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              gmporter George Porter
            • Votes:
              2 Vote for this issue
              Watchers:
              30 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: