Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-561

Integrate Oozie with HCatalog

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • trunk
    • 4.0.0
    • None
    • None

    Description

      With the incubation of HCatalog, we have a mechanism to abstract data and storage on HDFS. A natural progression for Oozie is to interact with HCatalog to facilitate the interplay between MapReduce, Pig and Hive. In addition, the support for notification in HCatalog will alleviate (and not eliminate) the need to poll HDFS for data sets represented as tables and partitions.

      Attachments

        1. OOZIE-561_merge_trunk.patch
          621 kB
          Virag Kothari
        2. Oozie-HCatalog.pptx
          68 kB
          Mona Chitnis
        1.
        Create JMSService used for any JMS compliant product Sub-task Closed Mohammad Islam  
        2.
        Generic utility class to register/unregister a JMS message handler Sub-task Closed Mohammad Islam  
        3.
        Utility class to parse HCat URI Sub-task Closed Ryota Egashira  
        4.
        Implement the Missing Dependency structure for HCat partitions Sub-task Closed Mona Chitnis

        0%

        Original Estimate - 336h
        Remaining Estimate - 336h
        5.
        Coordinator action table schema change Sub-task Closed Mohammad Islam  
        6.
        Add logic to register to Missing Dependency Structure in coord action materialization Sub-task Closed Ryota Egashira  
        7.
        Parameterize <uris> tag currently hardcoded Sub-task Closed Ryota Egashira  
        8.
        Implement logic to update dependencies via push JMS message Sub-task Closed Mona Chitnis

        0%

        Original Estimate - 72h
        Remaining Estimate - 72h
        9.
        Add configurable 'max_size' for total number of Partition dependency map entries Sub-task Resolved Unassigned

        0%

        Original Estimate - 24h
        Remaining Estimate - 24h
        10.
        Command to update push-based dependency Sub-task Closed Mohammad Islam  
        11.
        Add static method to create URI String in HCatURI Sub-task Closed Ryota Egashira  
        12.
        Add new EL function to retrieve HCatalog server, DB and table name Sub-task Closed Mohammad Islam  
        13.
        Add EL functions to get HCat dataIn and dataOut Sub-task Resolved Ryota Egashira  
        14.
        Metadata Accessor service for HCatalog Sub-task Closed Mohammad Islam  
        15.
        Update dataIn and dataOut EL functions to support partitions Sub-task Closed Mohammad Islam  
        16.
        Create general scheme handler Sub-task Closed Rohini Palaniswamy  
        17.
        Command to check the missing partitions directly against HCatalog server Sub-task Closed Mohammad Islam  
        18.
        Add HCatalog jar as resource for building Sub-task Closed Mona Chitnis  
        19.
        Revert OOZIE-1095 once dependent HCat jar mavenized Sub-task Closed Mona Chitnis  
        20.
        Resolve issues found in integration Sub-task Closed Mohammad Islam  
        21.
        Change default done-flag from _SUCCESS to empty for Hcat Sub-task Closed Mohammad Islam  
        22.
        Fix JMS message consumer to maintain single session per topic registration Sub-task Closed Mona Chitnis  
        23.
        The size of the map cache in PartitionDependencyManagerService should be configurable Sub-task Closed Mona Chitnis  
        24.
        change HCatURI to specify partitions in path instead of query parameter Sub-task Closed Rohini Palaniswamy  
        25.
        EL Functions for hcatalog Sub-task Closed Mona Chitnis  
        26.
        Prepare actions for hcat Sub-task Closed Rohini Palaniswamy  
        27.
        Display missing partition dependencies via job -info command on CLI Sub-task Closed Mona Chitnis

        0%

        Original Estimate - 24h
        Remaining Estimate - 24h
        28.
        Provide rule based mechanism to allow multiple hcatalog servers to connect to JMS server Sub-task Closed Virag Kothari  
        29.
        Escaping Ampersand in the HCat URI to bypass XML validation Sub-task Closed Rohini Palaniswamy  
        30.
        Modify Recovery Service to handle push missing dependencies Sub-task Closed Virag Kothari  
        31.
        DB upgrade scripts for hcat changes Sub-task Closed Ryota Egashira  
        32.
        Make all the latest/future instances as pull dependences Sub-task Closed Virag Kothari  
        33.
        EL function hcat:exists for decision making Sub-task Closed Rohini Palaniswamy  
        34.
        Add hcataloglib sub-module Sub-task Closed Mona Chitnis  
        35.
        Fix and rework PartitionDependency Management Sub-task Closed Rohini Palaniswamy  
        36.
        Reliability and retry for JMS connections Sub-task Closed Rohini Palaniswamy  
        37.
        Dependency cache with configurations for eviction, ttl and max elements in memory Sub-task Closed Rohini Palaniswamy  
        38.
        Retry jms connections on failure Sub-task Closed Rohini Palaniswamy  
        39.
        HCat EL functions for database and table should be modified Sub-task Closed Mona Chitnis  
        40.
        Create a hcat sharelib which can be included in pig, hive and java actions Sub-task Closed Rohini Palaniswamy  
        41.
        Review of HCAT integration branch Sub-task Resolved Unassigned  
        42.
        Rework uri handling for Prepare actions and jms server mapping Sub-task Closed Rohini Palaniswamy  
        43.
        Address review comments in OOZIE-1210 Sub-task Closed Rohini Palaniswamy  
        44.
        Create a HCatalog Integration Guide Sub-task Closed Rohini Palaniswamy  
        45.
        CoordPushCheck doesn't evaluate the configuration section which is propogated to workflow Sub-task Closed Virag Kothari  
        46.
        CoordActionInputCheck shouldn't queue CoordPushInputCheck Sub-task Closed Rohini Palaniswamy  
        47.
        Coord action timeout not happening when there is a exception Sub-task Closed Rohini Palaniswamy  
        48.
        Log messages for DependencyChecker class show wrong jobid and actionid Sub-task Closed Rohini Palaniswamy  
        49.
        latest() gets resolved before all push dependencies are resolved Sub-task Closed Rohini Palaniswamy  
        50.
        latest/future check for hcat can cause shutdown to hang Sub-task Closed Rohini Palaniswamy  
        51.
        Registered push dependencies are not removed on Coord Kill command Sub-task Closed Virag Kothari  
        52.
        Fix few HCat dependency check issues Sub-task Closed Rohini Palaniswamy  
        53.
        Dryrun option for push missing deps Sub-task Closed Virag Kothari  
        54.
        Exception in push dependency check when there is also a pull dependency leaves it in waiting till timeout Sub-task Closed Rohini Palaniswamy  
        55.
        Drop partition while reruning a job doesn't work as Hcatalog doesn't have DoAs support Sub-task Open Unassigned  
        56.
        CoordActionInputCheck requeues itself even if only push missing dependencies exist Sub-task Closed Virag Kothari  
        57.
        CoordPushDependencyCheck queued by Recovery Services doesn't remove dependencies from cache Sub-task Closed Rohini Palaniswamy  
        58.
        URIHandlerService not allowing relative path for URI's Sub-task Closed Virag Kothari  

        Activity

          People

            chitnis Mona Chitnis
            sms Santhosh Muthur Srinivasan
            Votes:
            0 Vote for this issue
            Watchers:
            18 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 456h
                456h
                Remaining:
                Remaining Estimate - 456h
                456h
                Logged:
                Time Spent - Not Specified
                Not Specified