Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-707

combine & standardize all gobblin scripts into one master script & restructure configs accordingly

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      gobblin supports multiple modes of executions ( CLI, Standalone, cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines utility to run cli and admin commands. There is a individual script for each of them.

      Having individual script introduces lot of issues

      1. all scripts handles gobblin variables, user parameters differently, and its highly inconsistent among various different gobblin scripts
      2. functionality around start, stop, status checking and handling PID's among lot of other things, varies vastly as per the implementation of the script.
      3. features like GC & JVM params, log4j file selection, classpath calculation, etc... exists in some gobblin scripts but not all, adding to inconsistent user experience.
      4. maintaining total 13 script would be too much effort.

      Also all the gobblin scripts share lot of common code to handle params, start, stop services, status checks, pid handling, etc... combining all the scripts into  1 not only makes maintenance easier but also brings clarity and consistency.

       

      Solution:

      1. there can be one gobblin.sh script to handle all gobblin commands and deployment options as per following signature. NOTE: This

      gobblin.sh  <command> <params>
      gobblin.sh  <execution-mode> <start|stop|status>

      commands values: admin, cli, statestore-check, statestore-clean, historystore-manager, classpath
      service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, service

      with above change, following becomes valid command.

      # all under GobblinCli class
      gobblin run listQuickApps  –> gobblin cli run listQuickApps
      gobblin run listQuickApps  –> gobblin cli run listQuickApps
      gobblin run <quick-app-name> -> gobblin cli run <quick-app-name>
      
      # class: JobStateToJsonConverter
      statestore-checker.sh <args> -> gobblin statestore-checker <args>
      
      # class: StateStoreCleaner
      statestore-clean.sh <args> -> gobblin statestore-clean <args>
      
      # class: DatabaseJobHistoryStoreSchemaManager
      historystore-manager.sh <args> -> gobblin historystore-manager <args>
      
      # class: Cli
      gobblin-admin.sh <args>   -> gobblin admin <args>
      
      # all gobblin deployment modes
      gobblin-cluster-master.sh   -> gobblin cluster-mater start|stop|status
      gobblin-cluster-worker.sh   -> gobblin cluster-mater start|stop|status
      gobblin-compaction.sh       -> gobblin cluster-mater start|stop|status
      gobblin-env.sh              -> gobblin cluster-mater start|stop|status
      gobblin-mapreduce.sh        -> gobblin cluster-mater start|stop|status
      gobblin-service.sh          -> gobblin cluster-mater start|stop|status
      gobblin-standalone.sh       -> gobblin cluster-mater start|stop|status
      gobblin-yarn.sh             -> gobblin cluster-mater start|stop|status
      

       

      2. Also configs needs to be structured and deduped accordingly to make it clear on which config will be picked up for which execution mode.

       
      NOTE: this refactoring adds all cli and service commands to gobblin.sh and hence changes the syntax for all commands and services.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                jaysen Jay Sen
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 9h
                  9h