Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2245

Service to periodically check database schema

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0.0b1
    • Component/s: core
    • Labels:
      None

      Description

      We've seen a number of issues related to the database schema being incorrect (more than you would think). It seems some users go and muck around in the Oozie database, adding/removing columns and indexes, changing the default value of columns, etc. The issues caused by this can be very difficult to track down because their cause is not obvious and we generally assume the database schema is correct. For example, we saw an issue where Oozie was taking a long time to create Coordinator actions, and it turned out that the cause was that some indexes were missing, which made the Purge queries slow, which slowed down the whole database whenever the PurgeService ran. Another example was that the pause time was automatically being set whenever a Coordinator job was submitted, because the default value for the column was incorrect.

      We should create a Service which periodically runs and checks that the schema is correct. It can output details about what's wrong to the log.

      1. OOZIE-2245.003.patch
        31 kB
        Robert Kanter
      2. OOZIE-2245.002.patch
        29 kB
        Robert Kanter
      3. OOZIE-2245.patch
        29 kB
        Robert Kanter

        Issue Links

          Activity

          Hide
          rkanter Robert Kanter added a comment -

          The patch adds a new SchemaCheckService and SchemaCheckXCommand. It uses the java Connection to get metadata on the database that Oozie is configured to use. Because it's not using OpenJPA stuff, we have to make a new connection to the database, which means it isn't going to work for Derby. It checks the following:

          • The expected tables
          • The expected columns, their types, and default values
          • The expected indexes and primary keys

          To make things easier to maintain in the future when we next change the database schema, a lot of the expected results are generated dynamically using reflection to inspect the Bean classes.

          There's a few places where different databases behave differently or want different things (e.g. MySQL and Oracle want uppercase table names but Postgres uses lowercase), so there's some checks for those. Otherwise I tried to keep it as generic as possible.

          There's configs for:

          • The interval the checker runs at. I set it to run once a week. It's also hardcoded to run 1 hour after Oozie startup
          • To ignore extra columns, tables, indexes. I imagine some users, who hopefully know what they're doing, may want to add extra indexes or other things, so turning this on will make the checker not count those as "bad"

          The checker also exposes an instrumentation metrics to report the status of the database schema and the last time it was run. The specific problems it finds are logged as ERRORs. There's also a bunch of DEBUG logging for things that are correct.

          I've tested this against MySQL, PostgreSQL, and Oracle and it works correctly. Though I'm not a database expert, and my Oracle install was kind of hacky, so any input or additional testing from Rohini Palaniswamy, Purshotam Shah, Mona Chitnis, Ryota Egashira would be helpful there. Oracle was also a bit different in that instead of a catalog (i.e. the database name which the Connection determined from the URL) it used a schema that I think is hardcoded to "OOZIE", but I'm a little iffy on that.
          I also don't have access to an MS SQLServer, so I disabled that for now. Bowen Zhang, could you try this against a SQLServer and make any tweaks that it needs? The core code should be correct, but you may have to make minor if statements like I did for the others. We can either do that as a followup JIRA or here.

          Show
          rkanter Robert Kanter added a comment - The patch adds a new SchemaCheckService and SchemaCheckXCommand . It uses the java Connection to get metadata on the database that Oozie is configured to use. Because it's not using OpenJPA stuff, we have to make a new connection to the database, which means it isn't going to work for Derby. It checks the following: The expected tables The expected columns, their types, and default values The expected indexes and primary keys To make things easier to maintain in the future when we next change the database schema, a lot of the expected results are generated dynamically using reflection to inspect the Bean classes. There's a few places where different databases behave differently or want different things (e.g. MySQL and Oracle want uppercase table names but Postgres uses lowercase), so there's some checks for those. Otherwise I tried to keep it as generic as possible. There's configs for: The interval the checker runs at. I set it to run once a week. It's also hardcoded to run 1 hour after Oozie startup To ignore extra columns, tables, indexes. I imagine some users, who hopefully know what they're doing, may want to add extra indexes or other things, so turning this on will make the checker not count those as "bad" The checker also exposes an instrumentation metrics to report the status of the database schema and the last time it was run. The specific problems it finds are logged as ERRORs. There's also a bunch of DEBUG logging for things that are correct. I've tested this against MySQL, PostgreSQL, and Oracle and it works correctly. Though I'm not a database expert, and my Oracle install was kind of hacky, so any input or additional testing from Rohini Palaniswamy , Purshotam Shah , Mona Chitnis , Ryota Egashira would be helpful there. Oracle was also a bit different in that instead of a catalog (i.e. the database name which the Connection determined from the URL) it used a schema that I think is hardcoded to "OOZIE", but I'm a little iffy on that. I also don't have access to an MS SQLServer, so I disabled that for now. Bowen Zhang , could you try this against a SQLServer and make any tweaks that it needs? The core code should be correct, but you may have to make minor if statements like I did for the others. We can either do that as a followup JIRA or here.
          Hide
          bowenzhangusa Bowen Zhang added a comment - - edited

          Robert Kanter, do you plan to make it into 4.2? I would want to check it in for future release. If that's the case, do you need to come back to modify this patch if we have a new Db schema for oozie-5.0? Looks like you have already disabled the check if the DB is sqlserver. BTW, can you post it in Review Board?

          Show
          bowenzhangusa Bowen Zhang added a comment - - edited Robert Kanter , do you plan to make it into 4.2? I would want to check it in for future release. If that's the case, do you need to come back to modify this patch if we have a new Db schema for oozie-5.0? Looks like you have already disabled the check if the DB is sqlserver. BTW, can you post it in Review Board?
          Hide
          rkanter Robert Kanter added a comment -

          This doesn't need to go into 4.2. It would be good to get more feedback on it anyway. I designed it so that (hopefully) we shouldn't have to update it for a new db schema; it figures out the tables, columns, etc to check for dynamically using reflection. It may require a minor tweak depending on what changes we make, but we shouldn't need to do anything major.

          I disabled the check for sqlserver for now because I hadn't tested that one. We can remove that once we make sure that works.

          I'll try to add it to review board, but I was having trouble last time I tried to use it.

          Show
          rkanter Robert Kanter added a comment - This doesn't need to go into 4.2. It would be good to get more feedback on it anyway. I designed it so that (hopefully) we shouldn't have to update it for a new db schema; it figures out the tables, columns, etc to check for dynamically using reflection. It may require a minor tweak depending on what changes we make, but we shouldn't need to do anything major. I disabled the check for sqlserver for now because I hadn't tested that one. We can remove that once we make sure that works. I'll try to add it to review board, but I was having trouble last time I tried to use it.
          Hide
          rkanter Robert Kanter added a comment -

          Here's the ReviewBoard link:
          https://reviews.apache.org/r/35472/

          Show
          rkanter Robert Kanter added a comment - Here's the ReviewBoard link: https://reviews.apache.org/r/35472/
          Hide
          rkanter Robert Kanter added a comment -

          I spoke to Bowen Zhang, and he suggested holding off on SQLServer support for now. I've created OOZIE-2279 to follow up on that.

          Show
          rkanter Robert Kanter added a comment - I spoke to Bowen Zhang , and he suggested holding off on SQLServer support for now. I've created OOZIE-2279 to follow up on that.
          Hide
          rkanter Robert Kanter added a comment -

          The 002 patch addresses Jaydeep's comments from RB.

          Show
          rkanter Robert Kanter added a comment - The 002 patch addresses Jaydeep's comments from RB.
          Hide
          hadoopqa Hadoop QA added a comment -

          Testing JIRA OOZIE-2245

          Cleaning local git workspace

          ----------------------------

          +1 PATCH_APPLIES
          +1 CLEAN
          -1 RAW_PATCH_ANALYSIS
          . +1 the patch does not introduce any @author tags
          . +1 the patch does not introduce any tabs
          . +1 the patch does not introduce any trailing spaces
          . +1 the patch does not introduce any line longer than 132
          . -1 the patch does not add/modify any testcase
          +1 RAT
          . +1 the patch does not seem to introduce new RAT warnings
          +1 JAVADOC
          . +1 the patch does not seem to introduce new Javadoc warnings
          -1 COMPILE
          . +1 HEAD compiles
          . +1 patch compiles
          . -1 the patch seems to introduce 6 new javac warning(s)
          -1 BACKWARDS_COMPATIBILITY
          . -1 the patch seems to change 1 line(s) with JPA Entity/Colum/Basic/Lob/Transient annotations
          . +1 the patch does not modify JPA files
          -1 TESTS - patch does not compile, cannot run testcases
          +1 DISTRO
          . +1 distro tarball builds with the patch

          ----------------------------
          -1 Overall result, please check the reported -1(s)

          The full output of the test-patch run is available at

          . https://builds.apache.org/job/oozie-trunk-precommit-build/2446/

          Show
          hadoopqa Hadoop QA added a comment - Testing JIRA OOZIE-2245 Cleaning local git workspace ---------------------------- +1 PATCH_APPLIES +1 CLEAN -1 RAW_PATCH_ANALYSIS . +1 the patch does not introduce any @author tags . +1 the patch does not introduce any tabs . +1 the patch does not introduce any trailing spaces . +1 the patch does not introduce any line longer than 132 . -1 the patch does not add/modify any testcase +1 RAT . +1 the patch does not seem to introduce new RAT warnings +1 JAVADOC . +1 the patch does not seem to introduce new Javadoc warnings -1 COMPILE . +1 HEAD compiles . +1 patch compiles . -1 the patch seems to introduce 6 new javac warning(s) -1 BACKWARDS_COMPATIBILITY . -1 the patch seems to change 1 line(s) with JPA Entity/Colum/Basic/Lob/Transient annotations . +1 the patch does not modify JPA files -1 TESTS - patch does not compile, cannot run testcases +1 DISTRO . +1 distro tarball builds with the patch ---------------------------- -1 Overall result, please check the reported -1(s) The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/2446/
          Hide
          rkanter Robert Kanter added a comment -

          The ". -1 the patch seems to change 1 line(s) with JPA Entity/Colum/Basic/Lob/Transient annotations" is due to a comment that has the word "@Column" in it and is not a real problem:

          +                    // Some Id fields don't have an @Column annotation
          

          It also did compile and ran all of the unit tests with no failures; not sure why it said it couldn't.
          https://builds.apache.org/job/oozie-trunk-precommit-build/2446/testReport/

          Show
          rkanter Robert Kanter added a comment - The ". -1 the patch seems to change 1 line(s) with JPA Entity/Colum/Basic/Lob/Transient annotations" is due to a comment that has the word "@Column" in it and is not a real problem: + // Some Id fields don't have an @Column annotation It also did compile and ran all of the unit tests with no failures; not sure why it said it couldn't. https://builds.apache.org/job/oozie-trunk-precommit-build/2446/testReport/
          Hide
          rkanter Robert Kanter added a comment -

          The 003 patch adds documentation and is rebased on the latest trunk.

          Show
          rkanter Robert Kanter added a comment - The 003 patch adds documentation and is rebased on the latest trunk.
          Hide
          jaydeepvishwakarma Jaydeep Vishwakarma added a comment -

          +1

          Show
          jaydeepvishwakarma Jaydeep Vishwakarma added a comment - +1
          Hide
          puru Purshotam Shah added a comment -

          Few minor comments.

          1.We can add this to oozie health page too. If there is a schema error, oozie health should indicate error.
          Oozie health will be useful, where we can check status of each oozie component like hdfs, sharelib, ZK, DB.
          After upgrade/meiantiace admin/ops can use Oozie health to validate each component.
          Beside reporting status of each component, it also report time taken to validate each component. That can be useful for finding slowness ( DB,ZK, HDFS slowness).
          This can be done once we checkin https://issues.apache.org/jira/browse/OOZIE-2306.

          2. One suggestion, can you add support of sending mail if there is schema error. Otherwise admin/ops has to keep on checking instrumentation and then log to find out issue.

          Show
          puru Purshotam Shah added a comment - Few minor comments. 1.We can add this to oozie health page too. If there is a schema error, oozie health should indicate error. Oozie health will be useful, where we can check status of each oozie component like hdfs, sharelib, ZK, DB. After upgrade/meiantiace admin/ops can use Oozie health to validate each component. Beside reporting status of each component, it also report time taken to validate each component. That can be useful for finding slowness ( DB,ZK, HDFS slowness). This can be done once we checkin https://issues.apache.org/jira/browse/OOZIE-2306 . 2. One suggestion, can you add support of sending mail if there is schema error. Otherwise admin/ops has to keep on checking instrumentation and then log to find out issue.
          Hide
          rkanter Robert Kanter added a comment -

          Would it make sense to have the sending an email be part of OOZIE-2306? We can then make it so that an email is sent whenever there's any kind of health check failure.

          Show
          rkanter Robert Kanter added a comment - Would it make sense to have the sending an email be part of OOZIE-2306 ? We can then make it so that an email is sent whenever there's any kind of health check failure.
          Hide
          puru Purshotam Shah added a comment -

          Currently OOZIE-2306 doesn't run as service, it's a web-service call.
          We can decide the mail part later on.

          +1.

          Show
          puru Purshotam Shah added a comment - Currently OOZIE-2306 doesn't run as service, it's a web-service call. We can decide the mail part later on. +1.
          Hide
          rkanter Robert Kanter added a comment -

          Thanks for the review Puru. Committed to trunk!

          Show
          rkanter Robert Kanter added a comment - Thanks for the review Puru. Committed to trunk!

            People

            • Assignee:
              rkanter Robert Kanter
              Reporter:
              rkanter Robert Kanter
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development