Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-3336

[persistence] Refactor entity classes to feature PK, FK, and UQ constraints

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 5.0.0
    • 5.3.0
    • core
    • None

    Description

      When an Oozie database grows substantial in size, let's say, over a few hundred thousands of WorkflowActionBean, CoordinatorActionBean instances, we face a couple of performance issues. Here is an analysis why.

      Current Oozie JPA @Entity usage, and the resulting database DDL, suffers from a couple of drawback from a performance point of view:

      • @Id fields are String:
        • leaving no space for database primary key indices to work effectively
        • those values are calculated in case of WorkflowActionBean, CoordinatorActionBean, and BundleActionBean instances
      • no foreign constraint is set from WorkflowActionBean to WorkflowJobBean, from CoordinatorActionBean to CoordinatorJobBean, or from BundleActionBean to BundleJobBean instances:
        • have to assess JPA queries discovering parent-child relationships by hand
        • no database indices are created, and hence, those queries that contain any JOIN instances are slower
      • no use of unique constraints whatsoever
      • JPA queries are created by hand instead of relying on OpenJPA
      • JPA entities are filled by hand instead of relying on OpenJPA

      Following enhancements are necessary:

      1. keeping the existing String compositeId fields, let's break down the contents to following new fields:
        1. @Id long id - an auto-increment value that is unique across Oozie database
        2. long currentSequence - the sequence number of the current run since last Oozie server restart. The first part of the compositeId
        3. Timestamp serverStartupTimestamp - the timestamp when the Oozie server was last started. The second part of the compositeId
        4. String serverName - the third part of the compositeId
        5. String name - the fourth and last part of the compositeId
        6. compositeId might be calculated when an entity is loaded / persisted, and then stored
      2. FK constraints:
        1. @OneToMany fields where we have a list of child references inside parent
        2. @ManyToOne fields where we have a parent reference inside child
        3. pay attention to FetchType, most of the times LAZY will be needed
        4. the containment fields should not be @Transient anymore
      3. UQ constraints:
        1. on currentSequence and serverStartupTimestamp
        2. on currentSequence and name
      4. new JPQL queries:
        1. to cover changed parent-child relationships
        2. to get use of each disassembled part of originalId when doing e.g. filtering
      5. let JPA fill entities instead performing this by hand

      Following enhancements can be considered as nice-to-have:

      • upgrade to an OpenJPA version that features JPA 2.1's composite indexing capability
      • see whether to have an optimistic locking field using @Version instead of ZooKeeper based pessimistic locking would increase High Availability characteristics
      • refactor also SLA related entity classes

      It's necessary to have performance benchmarks with some database types like MySQL/MariaDB, and PostgreSQL before and after the changes for following use cases:

      • CoordinatorJobBean and WorkflowJobBean instances up to millions
      • CoordinatorActionBean and WorkflowActionBean instances up to tens of millions
      • performance for JPQLs that get a list of entities
      • performance of persisting a new entity
      • performance of querying lists of entities based on popular / possible filters like the ones used by VxJobsServlet

      Attachments

        Activity

          People

            Unassigned Unassigned
            andras.piros Andras Piros
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: