Uploaded image for project: 'Sqoop'
  1. Sqoop
  2. SQOOP-2025

Input/State history per job run / submission

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: 2.0.0
    • Component/s: None
    • Labels:
      None

      Description

      As per SQOOP-1804, we will be storing both treating both the config inputs and intermediate state generated as part of the job run in the config object.

      Currently the config object is stored in the repository model under

      SQ_CONFIG

      table. It is per SQ_CONFIGURABLE.

      The inputs within the Config class and its attirbutes are stored in the

      SQ_INPUT

      i,e the columns in the SQ_INPUT map to the attributed of the config @Input annotation

       @Input(size = 50)
        public String schemaName;
      
        @Input(size = 50)
        public String tableName;
      
      

      The actual values for the SQ_INPUT keys per sqoop job are stored in

      SQ_JOB_INPUT and SQ_LINK_INPUT 
      

      So this means we overwrite the config input values for every job run.
      Lets take an example.

      if a job is started with config value for key "test" as foo, the first job run the SQ_INPUT will reflect the value foo. Before the second run, say the value was modified to "bar" then the SQ_INPUT table will reflect the value "bar", if the user were supposed to query the config values based on the job Id, they will only see the last value modified i.e "bar", it does not tell the user the value that was used before and job run started and the value the job run / submission ended.

      The proposal is to provide this history so that the user can track per job run the config input values.

      A simple proposal is to have a FK submission_id in the SQ_JOB_INPUT table,
      and SQ_LINK_INPUT table.

      Anand Iyer also suggested we store before/ after config state if possible

      To do the BEFORE/AFTER config history,

      1. We will create a new set of values for each config inputs for every job run, based on the prev state ( or ) if the user edits the configs while the prev job is running, create new ones with null submissionId, and associate it will the submission Id once the job run starts. Once the job run finishes, we will write the config values again to store the AFTER information

      2. We will need to store the BEFORE/AFTER indicator in another column.

      3. We will make only the last run config input values editable if the job has not yet started.

      Pros:
      We have a history per job run that we can query
      We do not have race conditions on config input value edits, since every job run has its own state

      Cons
      We will have a lot of entries in the SQ_JOB_INPUT and SQ_LINK_INPUT than we have now, but I see this unprecedented if we need to provide easy debuggability to the users on what inputs and values were used every job run, what values where edited etc.

        Attachments

        There are no Sub-Tasks for this issue.

          Activity

            People

            • Assignee:
              vybs Veena Basavaraj
              Reporter:
              vybs Veena Basavaraj
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: