Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-1308

Gobblin's kerberos token management for remote clusters

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.15.0
    • 0.17.0
    • None
    • None

    Description

      Gobblin's hadoop tokens/ key management : 
      Problem: Gobblin only maintains local cluster tokens when key management is enabled. and does not have capability to manage tokens for remote hadoop cluster. ( based on my conversation with many folks here, the token files can be made available externally. but that would require that external system running on cron or something )

      Solution: add remote cluster token management in Gobblin. where remote clusters key can be managed same way it manages the local clusters keys.

       

      Config looks like following

      ( Changes the enable.key.management config to key.management.enabled )

       

      gobblin.hadoop.key.management {
       enabled = true
       remote.clusters = [ ${gobblin_sync_systems.hadoop_cluster1}, ${gobblin_sync_systems.hadoop_cluster2} ]
      }
      
      // These Gobblin platform configurations can be moved to database for other use-cases, but this layout helps make the platform moduler for each connectors.
      gobblin_sync_systems {
       hadoop_cluster1 {
       // if Hadoop config path is specified, the FileSystem will be created based on all the xml config provided here, which has all the required info.
       hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config"
       // If hadoop config path is not specified, you can still specify the speecific nodes for the specific type of tokens
       namenode_uri = ["hdfs://nn1.hadoop_cluster1.example.com:8020", "hdfs://nn2.hadoop_cluster1.example.com:8020"]
       kms_nodes = [ "kms1.hadoop_cluster1.example.com:9292", "kms2.hadoop_cluster1.example.com:9292" ]
       }
       hadoop_cluster2 {
       hadoop_config_path = "file:///etc/hadoop_cluster1/hadoop/config"
       namenode_uri = ["hdfs://nn1.hadoop_cluster2.example.com:8020", "hdfs://nn2.hadoop_cluster2.example.com:8020"]
       kms_nodes = [ "kms1.hadoop_cluster2.example.com:9292", "kms2.hadoop_cluster2.example.com:9292" ]
       }
      }

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jaysen Jay Sen
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h 10m
                  4h 10m