Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-22721

Centralize the Management of Tarball Uploading

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.6.2
    • 3.0.0
    • None
    • None

    Description

      Ambari is required to upload tarballs into HDFS for many of the services to correctly function after they are installed. This tarball management is not centralized in any way, and is instead, spread out between several different Python files for various services:

      Hive uploads Tez, MapReduce2, Sqoop, etc tarballs
      Yarn does Tez, Slider, MapReduce2

      This causes a problem when patching a specific service, such as Sqoop. Sqoop requires that sqoop.tar.gz and mapreduce.tar.gz are available in the same versioned folder in HDFS. However, no Sqoop components perform this upload - Hive does. So, if Hive is not upgrading, these tarballs are never uploaded.

      The proposal here is to remove the coupling of tarball uploads and to manage these relationships on the stack:

      {
        "tarball": {
          "MAPREDUCE2": {
            "JOB_HISTORY_SERVER": [
              {
                "tarball": "mapreduce.tar.gz",
                "source_dir": "{0}/{1}/hadoop/mapreduce.tar.gz",
                "target_dir": "/{0}/apps/{1}/mapreduce/mapreduce.tar.gz"
              }
            ]
          },
          "HIVE": {
            "HIVE_SERVER2": [
              {
                "tarball": "mapreduce.tar.gz",
                "source_dir": "{0}/{1}/hadoop/mapreduce.tar.gz",
                "target_dir": "/{0}/apps/{1}/mapreduce/mapreduce.tar.gz"
              },
              {
                "tarball": "sqoop.tar.gz",
                "source_dir": "{0}/{1}/sqoop/sqoop.tar.gz",
                "target_dir": "/{0}/apps/{1}/sqoop/sqoop.tar.gz"
              }
            ]
          },
          "SQOOP": {
            "SQOOP": [
              {
                "tarball": "mapreduce.tar.gz",
                "source_dir": "{0}/{1}/hadoop/mapreduce.tar.gz",
                "target_dir": "/{0}/apps/{1}/mapreduce/mapreduce.tar.gz"
              },
              {
                "tarball": "sqoop.tar.gz",
                "source_dir": "{0}/{1}/sqoop/sqoop.tar.gz",
                "target_dir": "/{0}/apps/{1}/sqoop/sqoop.tar.gz"
              }
            ]
          }
        }
      }
      
      • after-INSTALL hooks will check for CLIENT as the component category
      • after-START hooks will check for NOT CLIENT

      Additionally, using the file length for a checksum may no longer be sufficient. We should also add a checksum file to HDFS for each tarball so we can easily tell if work needs to be done (during an install, restart, upgrade, etc) to upload a new tarball (one that is also potentially modified with native libraries):

      ambari-tarball-checksum.json (0644)
      {
        "mapreduce.tar.gz": {
          "native_libraries": true,
          "file_count": 509
        },
        "hadoop-streaming.tar.gz": {
          "native_libraries": false,
          "file_count": 10  
        }
      }
      

      Attachments

        Activity

          People

            jonathanhurley Jonathan Hurley
            jonathanhurley Jonathan Hurley
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: