Uploaded image for project: 'Atlas'
  1. Atlas
  2. ATLAS-181 Integrate storm topology metadata into Atlas
  3. ATLAS-183

Add a Hook in Storm to post the topology metadata

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.6-incubating
    • Fix Version/s: 0.7-incubating
    • Component/s: None
    • Labels:
      None

      Description

      Apache Storm Integration with Apache Atlas (incubating)
      Introduction
      Apache Storm is a distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. The process is essentially a DAG of nodes, which is called topology.

      Apache Atlas is a metadata repository that enables end-to-end data lineage, search and associate business classification.
      Overview
      The goal of this integration is to at minimum push the operational topology metadata along with the underlying data source(s), target(s), derivation processes and any available business context so Atlas can capture the lineage for this topology.

      It would also help to support custom user annotations per node in the topology.

      There are 2 parts in this process detailed below:
      Data model to represent the concepts in Storm
      Storm Bridge to update metadata in Atlas
      Data Model
      A data model is represented as a Type in Atlas. It contains the descriptions of various nodes in the DAG, such as spouts and bolts and the corresponding source and target types. These need to be expressed as Types in Atlas type system. At the least, we need to create types for:
      Storm topology containing spouts, bolts, etc. with associations between them
      Source (typically Kafka, etc.)
      Target (typically Hive, HBase, HDFS, etc.)

      You can take a look at the data model code for Hive. Storm should only be simpler than Hive from a data modeling perspective.
      Pushing Metadata into Atlas
      There are 2 parts to the bridge:
      Storm Bridge
      This is a one-time import for Storm to list all the active topologies and push the metadata into Atlas to address cases where Storm deployments exist before Atlas.

      You can refer to the bridge code for Hive.

      Post-execution Hook
      Atlas needs to be notified when a new topology is registered successfully in Storm or when someone changes the definition of an existing topology.

      You can refer to the hook code for Hive.

      Example use case:
      Custom annotations associated with each node in the topology.
      For example: Data Quality Rules, Error Handling, etc. A set of annotations that enumerates rules handling nulls– all nulls for a column get filtered, etc.

        Attachments

        1. ATLAS-183-4.patch
          55 kB
          Shwetha GS
        2. ATLAS-183-1.patch
          55 kB
          Hemanth Yamijala
        3. ATLAS-183.patch
          49 kB
          Venkatesh Seetharam

          Issue Links

            Activity

              People

              • Assignee:
                yhemanth Hemanth Yamijala
                Reporter:
                svenkat Venkatesh Seetharam
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: