Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-707

combine & standardize all gobblin scripts into one master script & restructure configs accordingly



    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:


      gobblin supports multiple modes of executions ( CLI, Standalone, cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines utility to run cli and admin commands. The problem is each cli and execution mode has individual script to manage the service, which brings following problems.

      Having individual script introduces lot of issues

      1. all scripts handles gobblin variables, user parameters differently, and its highly inconsistent among various different gobblin scripts, not to mention different features supported by different scripts.
      2. functionality around start, stop, status checking and handling PID's among lot of other things, varies vastly as per the implementation of the script.
      3. features like GC & JVM params, log4j file selection, classpath calculation, etc... exists in some gobblin scripts but not all, adding to inconsistent user experience.
      4. code duplication: all the gobblin scripts share lot of common code to handle params, start, stop services, status checks, pid handling, etc... combining all the scripts into 1 not only makes maintenance easier but also brings clarity and consistency.
      5. Basically, current 13 different scripts adds confusion to new user on how to use Gobblin or how to use it.


      1. there can be one gobblin.sh script to handle all gobblin commands and deployment options as per following signature. NOTE: This

      gobblin.sh  <command> <params>
      gobblin.sh  <execution-mode> <start|stop|status>

      commands values: admin, cli, statestore-check, statestore-clean, historystore-manager, classpath
      service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, service

      with above change, following becomes valid command.

      # all under GobblinCli class
      gobblin run listQuickApps  –> gobblin cli run listQuickApps <params>
      gobblin run <quick-app-name> -> gobblin cli run <quick-app-name> <params>
      # class: JobStateToJsonConverter
      statestore-checker.sh <args> -> gobblin cli job-state-to-json <params>
      # class: StateStoreCleaner
      statestore-clean.sh <args> -> the class is depricated so no need to migrate this over.
      # class: DatabaseJobHistoryStoreSchemaManager
      historystore-manager.sh <args> -> gobblin cli job-store-schema-manager <params>
      # class: Cli
      gobblin-admin.sh <args>   -> gobblin cli admin <args>
      # all gobblin deployment modes
      gobblin-cluster-master.sh   -> gobblin service cluster-master start|stop|status
      gobblin-cluster-worker.sh   -> gobblin service cluster-worker start|stop|status
      gobblin-compaction.sh       -> gobblin-compaction.sh  ( kept as it is for now, can be migrated to new script framework)
      gobblin-mapreduce.sh        -> gobblin service mapreduce start|stop|status
      gobblin-service.sh               -> gobblin service service-manager start|stop|status
      gobblin-standalone.sh        -> gobblin service standalone start|stop|status
      gobblin-yarn.sh                   -> gobblin service yarn start|stop|status


      2. Also all configurations for each mode needs to be structured and de-duped accordingly to make it clear on which config will be picked up for which execution mode. This would be well defined in command help instructions.

      NOTE: this refactoring adds all cli and service commands to gobblin.sh and hence changes the syntax for all commands and services.


          Issue Links



              • Assignee:
                jaysen Jay Sen
              • Votes:
                0 Vote for this issue
                2 Start watching this issue


                • Created:

                  Time Tracking

                  Original Estimate - Not Specified
                  Not Specified
                  Remaining Estimate - 0h
                  Time Spent - 10h 40m
                  10h 40m