Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2928

YARN Timeline Service v.2: alpha 1

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha1
    • Component/s: timelineserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      We are introducing an early preview (alpha 1) of a major revision of YARN Timeline Service: v.2. YARN Timeline Service v.2 addresses two major challenges: improving scalability and reliability of Timeline Service, and enhancing usability by introducing flows and aggregation.

      YARN Timeline Service v.2 alpha 1 is provided so that users and developers can test it and provide feedback and suggestions for making it a ready replacement for Timeline Service v.1.x. It should be used only in a test capacity. Most importantly, security is not enabled. Do not set up or use Timeline Service v.2 until security is implemented if security is a critical requirement.

      More details are available in the [YARN Timeline Service v.2](./hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html) documentation.
      Show
      We are introducing an early preview (alpha 1) of a major revision of YARN Timeline Service: v.2. YARN Timeline Service v.2 addresses two major challenges: improving scalability and reliability of Timeline Service, and enhancing usability by introducing flows and aggregation. YARN Timeline Service v.2 alpha 1 is provided so that users and developers can test it and provide feedback and suggestions for making it a ready replacement for Timeline Service v.1.x. It should be used only in a test capacity. Most importantly, security is not enabled. Do not set up or use Timeline Service v.2 until security is implemented if security is a critical requirement. More details are available in the [YARN Timeline Service v.2](./hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html) documentation.

      Description

      We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed.

      This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort.

        Attachments

        1. ATSv2.rev1.pdf
          249 kB
          Sangjin Lee
        2. ATSv2.rev2.pdf
          252 kB
          Sangjin Lee
        3. Data model proposal v1.pdf
          89 kB
          Zhijie Shen
        4. Timeline Service Next Gen - Planning - ppt.pptx
          345 kB
          Vinod Kumar Vavilapalli
        5. TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf
          179 kB
          Vrushali C
        6. ATSv2BackendHBaseSchemaproposal.pdf
          259 kB
          Sangjin Lee
        7. timeline_service_v2_next_milestones.pdf
          129 kB
          Sangjin Lee
        8. The YARN Timeline Service v.2 Documentation.pdf
          743 kB
          Sangjin Lee
        9. YARN-2928.01.patch
          2.36 MB
          Sangjin Lee
        10. YARN-2928.02.patch
          2.37 MB
          Sangjin Lee
        11. YARN-2928.03.patch
          2.37 MB
          Sangjin Lee

        Issue Links

        1.
        [Collector wireup] Set up timeline collector with basic request serving structure and lifecycle Sub-task Resolved Sangjin Lee Actions
        2.
        [Data Model] create overall data objects of TS next gen Sub-task Resolved Zhijie Shen Actions
        3.
        [Collector wireup] Implement RM starting its timeline collector Sub-task Resolved Naganarasimha G R Actions
        4.
        [Storage implementation] Create a test-only backing storage implementation for ATS writes Sub-task Resolved Sangjin Lee Actions
        5.
        [Storage abstraction] Create backing storage write interface for timeline collectors Sub-task Resolved Vrushali C Actions
        6.
        [Storage implementation] Create standalone HBase backing storage implementation for ATS writes Sub-task Resolved Zhijie Shen Actions
        7.
        [Storage implementation] Create HBase cluster backing storage implementation for ATS writes Sub-task Resolved Vrushali C Actions
        8.
        [Collector wireup] Implement timeline app-level collector service discovery Sub-task Resolved Junping Du Actions
        9.
        [Data Model] Make putEntities operation be aware of the app's context Sub-task Resolved Zhijie Shen Actions
        10.
        [Data Model] Create ATS metrics API Sub-task Resolved Unassigned Actions
        11.
        [Data Model] Create ATS configuration, metadata, etc. as part of entities Sub-task Resolved Unassigned Actions
        12.
        [Event producers] Implement RM writing app lifecycle events to ATS Sub-task Resolved Naganarasimha G R Actions
        13.
        [Event producers] Implement NM writing container lifecycle events to ATS Sub-task Resolved Naganarasimha G R Actions
        14.
        [Data Serving] Set up ATS reader with basic request serving structure and lifecycle Sub-task Resolved Varun Saxena Actions
        15.
        [Source organization] Refactor timeline collector according to new code organization Sub-task Resolved Li Lu Actions
        16.
        [Data Serving] Handle how to set up and start/stop ATS reader instances Sub-task Resolved Varun Saxena Actions
        17.
        [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend Sub-task Resolved Zhijie Shen Actions
        18.
        [Storage abstraction] Create backing storage read interface for ATS readers Sub-task Resolved Varun Saxena Actions
        19.
        [Data Serving] Provide a very simple POC html ATS UI Sub-task Resolved Sangjin Lee Actions
        20.
        Bootstrap TimelineServer Next Gen Module Sub-task Resolved Zhijie Shen Actions
        21.
        [Collector implementation] the REST server (web server) for per-node collector does not work if it runs inside node manager Sub-task Resolved Li Lu Actions
        22.
        [Collector wireup] We need an assured way to determine if a container is an AM container on NM Sub-task Resolved Giovanni Matteo Fumarola Actions
        23.
        [Event producers] Change distributed shell to use new timeline service Sub-task Resolved Junping Du Actions
        24.
        [Storage implementation] Exploiting the option of using Phoenix to access HBase backend Sub-task Resolved Li Lu Actions
        25.
        [Documentation] Documenting the timeline service v2 Sub-task Resolved Sangjin Lee Actions
        26.
        [Collector implementation] Implement the core functionality of the timeline collector Sub-task Resolved Vrushali C Actions
        27.
        [Data Mode] Implement client API to put generic entities Sub-task Resolved Zhijie Shen Actions
        28.
        [Storage implementation] Create backing storage write interface and a POC only file based storage implementation Sub-task Resolved Vrushali C Actions
        29.
        Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings Sub-task Resolved Junping Du Actions
        30.
        rename TimelineAggregator etc. to TimelineCollector Sub-task Resolved Sangjin Lee Actions
        31.
        [Event Producers] NM TimelineClient container metrics posting to new timeline service. Sub-task Resolved Junping Du Actions
        32.
        Replace starting a separate thread for post entity with event loop in TimelineClient Sub-task Resolved Naganarasimha G R Actions
        33.
        Collector's web server should randomly bind an available port Sub-task Resolved Zhijie Shen Actions
        34.
        TestTimelineServiceClientIntegration fails Sub-task Resolved Sangjin Lee Actions
        35.
        Reuse TimelineCollectorManager for RM Sub-task Resolved Zhijie Shen Actions
        36.
        Clearly define flow ID/ flow run / flow version in API and storage Sub-task Resolved Zhijie Shen Actions
        37.
        Security support for new timeline service. Sub-task Resolved Unassigned Actions
        38.
        [Storage implementation] explore & create the native HBase schema for writes Sub-task Resolved Vrushali C Actions
        39.
        Sub resources of timeline entity needs to be passed to a separate endpoint. Sub-task Resolved Zhijie Shen Actions
        40.
        Cache runningApps in RMNode for getting running apps on given NodeId Sub-task Resolved Junping Du Actions
        41.
        Consolidate flow name/version/run defaults Sub-task Resolved Sangjin Lee Actions
        42.
        Add miniHBase cluster and Phoenix support to ATS v2 unit tests Sub-task Resolved Li Lu Actions
        43.
        Consolidate data model change according to the backend implementation Sub-task Resolved Zhijie Shen Actions
        44.
        unit tests failures and issues found from findbug from earlier ATS checkins Sub-task Resolved Naganarasimha G R Actions
        45.
        HttpServer2 Max threads in TimelineCollectorManager should be more than 10 Sub-task Resolved Varun Saxena Actions
        46.
        RM only get back addresses of Collectors that NM needs to know. Sub-task Resolved Junping Du Actions
        47.
        Performance optimization using connection cache of Phoenix timeline writer Sub-task Resolved Li Lu Actions
        48.
        TestMRTimelineEventHandling and TestApplication are broken Sub-task Resolved Sangjin Lee Actions
        49.
        Decide if flow version should be part of row key or column Sub-task Resolved Unassigned Actions
        50.
        Generalize native HBase writer for additional tables Sub-task Resolved Joep Rottinghuis Actions
        51.
        build is broken on YARN-2928 branch due to possible dependency cycle Sub-task Resolved Li Lu Actions
        52.
        Fix TestHBaseTimelineWriterImpl unit test failure by fixing it's test data Sub-task Resolved Vrushali C Actions
        53.
        Test case failures in TestDistributedShell and some issue fixes related to ATSV2 Sub-task Resolved Naganarasimha G R Actions
        54.
        [JDK-8][YARN-2928] Exclude jdk.tools from hbase-client and hbase-testing-util Sub-task Resolved Tsuyoshi Ozawa Actions
        55.
        REST API implementation for getting raw entities in TimelineReader Sub-task Resolved Varun Saxena Actions
        56.
        [Aggregation] App-level aggregation and accumulation for YARN system metrics Sub-task Resolved Li Lu Actions
        57.
        add equals and hashCode to TimelineEntity and other classes in the data model Sub-task Resolved Li Lu Actions
        58.
        Support for fetching specific configs and metrics based on prefixes Sub-task Resolved Varun Saxena Actions
        59.
        Support complex filters in TimelineReader Sub-task Resolved Varun Saxena Actions
        60.
        Implement support for querying single app and all apps for a flow run Sub-task Resolved Varun Saxena Actions
        61.
        Add equals and hashCode to TimelineEntity Sub-task Resolved Li Lu Actions
        62.
        Populate flow run data in the flow_run & flow activity tables Sub-task Resolved Vrushali C Actions
        63.
        Refactor timelineservice.storage to add support to online and offline aggregation writers Sub-task Resolved Li Lu Actions
        64.
        split the application table from the entity table Sub-task Resolved Sangjin Lee Actions
        65.
        Bugs in HBaseTimelineWriterImpl Sub-task Resolved Vrushali C Actions
        66.
        ensure timely flush of timeline writes Sub-task Resolved Sangjin Lee Actions
        67.
        Fix new findbugs warnings in resourcemanager in YARN-2928 branch Sub-task Resolved Varun Saxena Actions
        68.
        Rethink event column key issue Sub-task Resolved Vrushali C Actions
        69.
        Change to use the AM flag in ContainerContext determine AM container Sub-task Resolved Sunil G Actions
        70.
        Some of the NM events are not getting published due race condition when AM container finishes in NM Sub-task Resolved Naganarasimha G R Actions
        71.
        Change the way metric values are stored in HBase Storage Sub-task Resolved Varun Saxena Actions
        72.
        Publisher V2 should write the unmanaged AM flag and application priority Sub-task Resolved Sunil G Actions
        73.
        Deal with byte representations of Longs in writer code Sub-task Resolved Sangjin Lee Actions
        74.
        Miscellaneous issues in NodeManager project Sub-task Resolved Naganarasimha G R Actions
        75.
        Add the flush and compaction functionality via coprocessors and scanners for flow run table Sub-task Resolved Vrushali C Actions
        76.
        Populate the flow activity table Sub-task Resolved Vrushali C Actions
        77.
        build is broken at TestHBaseTimelineWriterImpl.java Sub-task Resolved Sangjin Lee Actions
        78.
        Support appUpdated event in TimelineV2 to publish details for movetoqueue, change in priority Sub-task Resolved Sunil G Actions
        79.
        [timeline reader] implement support for querying for flows and flow runs Sub-task Resolved Sangjin Lee Actions
        80.
        [reader REST API] implement support for querying for flows and flow runs Sub-task Resolved Varun Saxena Actions
        81.
        Add a "skip existing table" mode for timeline schema creator Sub-task Resolved Li Lu Actions
        82.
        Refactor the SystemMetricPublisher in RM to better support newer events Sub-task Resolved Naganarasimha G R Actions
        83.
        Fix javadoc warnings floating up from hbase Sub-task Resolved Sangjin Lee Actions
        84.
        [storage implementation] app id as string in row keys can cause incorrect ordering Sub-task Resolved Varun Saxena Actions
        85.
        [reader implementation] support flow activity queries based on time Sub-task Resolved Varun Saxena Actions
        86.
        Refactor reader classes in storage to nest under hbase specific package name Sub-task Resolved Li Lu Actions
        87.
        Add request/response logging & timing for each REST endpoint call Sub-task Resolved Varun Saxena Actions
        88.
        HBase reader throws NPE if Get returns no rows Sub-task Resolved Varun Saxena Actions
        89.
        Store user in app to flow table Sub-task Resolved Varun Saxena Actions
        90.
        Support fetching entities by UID and change the REST interface to conform to current REST APIs' in YARN Sub-task Resolved Varun Saxena Actions
        91.
        Support additional queries for ATSv2 Web UI Sub-task Resolved Varun Saxena Actions
        92.
        correctly set createdTime and remove modifiedTime when publishing entities Sub-task Resolved Varun Saxena Actions
        93.
        TestJobHistoryEventHandler and TestRMContainerAllocator failing on YARN-2928 branch Sub-task Resolved Varun Saxena Actions
        94.
        TestDistributedShell fails for V2 scenarios Sub-task Resolved Naganarasimha G R Actions
        95.
        ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off Sub-task Resolved Sangjin Lee Actions
        96.
        Fix javadoc and checkstyle issues in timelineservice code Sub-task Resolved Varun Saxena Actions
        97.
        Unify the term flowId and flowName in timeline v2 codebase Sub-task Resolved Zhan Zhang Actions
        98.
        Refactor reader API for better extensibility Sub-task Resolved Varun Saxena Actions
        99.
        Provide a mechanism to represent complex filters and parse them at the REST layer Sub-task Resolved Varun Saxena Actions
        100.
        TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail Sub-task Resolved Sangjin Lee Actions
        101.
        [Bug fix] RM fails to start when SMP is enabled Sub-task Resolved Li Lu Actions
        102.
        TestDistributedShell fails for v2 test cases after modifications for 1.5 Sub-task Resolved Naganarasimha G R Actions
        103.
        TestRMRestart fails and findbugs issue in YARN-2928 branch Sub-task Resolved Varun Saxena Actions
        104.
        New findbugs warning in resourcemanager in YARN-2928 branch Sub-task Resolved Varun Saxena Actions
        105.
        ATS storage has one extra record each time the RM got restarted Sub-task Resolved Naganarasimha G R Actions
        106.
        NM is going down with NPE's due to single thread processing of events by Timeline client Sub-task Resolved Naganarasimha G R Actions
        107.
        CPU Usage Metric is not captured properly in YARN-2928 Sub-task Resolved Naganarasimha G R Actions
        108.
        Add a check in the coprocessor for table to operated on Sub-task Resolved Vrushali C Actions
        109.
        Ensure non-metric values are returned as is for flow run table from the coprocessor Sub-task Resolved Vrushali C Actions
        110.
        Online aggregation logic should not run immediately after collectors got started Sub-task Resolved Li Lu Actions
        111.
        hbase unit tests fail due to dependency issues Sub-task Resolved Sangjin Lee Actions
        112.
        Code cleanup for TestDistributedShell Sub-task Resolved Li Lu Actions
        113.
        [Documentation] Update timeline service v2 documentation to capture information about filters Sub-task Resolved Varun Saxena Actions
        114.
        upgrade HBase version for first merge Sub-task Resolved Vrushali C Actions
        115.
        created time shows 0 in most REST output Sub-task Resolved Varun Saxena Actions
        116.
        flow activities and flow runs are populated with wrong timestamp when RM restarts w/ recovery enabled Sub-task Resolved Varun Saxena Actions
        117.
        timelinereader has a lot of logging that's not useful Sub-task Resolved Sangjin Lee Actions
        118.
        NPE in Separator.joinEncoded() Sub-task Resolved Vrushali C Actions
        119.
        timeline service build fails with java 8 Sub-task Resolved Sangjin Lee Actions
        120.
        entire time series is returned for YARN container system metrics (CPU and memory) Sub-task Resolved Varun Saxena Actions
        121.
        timestamps are stored unencoded causing parse errors Sub-task Resolved Varun Saxena Actions
        122.
        YARN container system metrics are not aggregated to application Sub-task Resolved Naganarasimha G R Actions
        123.
        fix "no findbugs output file" error for hadoop-yarn-server-timelineservice-hbase-tests Sub-task Resolved Vrushali C Actions
        124.
        fix findbugs warnings/errors for hadoop-yarn-server-timelineservice-hbase-tests Sub-task Resolved Vrushali C Actions
        125.
        Escaping occurences of encodedValues Sub-task Resolved Sangjin Lee Actions
        126.
        Eliminate singleton converters and static method access Sub-task Resolved Joep Rottinghuis Actions
        127.
        [documentation] several updates/corrections to timeline service documentation Sub-task Resolved Sangjin Lee Actions
        128.
        Make HBaseTimeline[Reader|Writer]Impl default and move FileSystemTimeline*Impl Sub-task Resolved Joep Rottinghuis Actions
        129.
        NPE in Distributed Shell while publishing DS_CONTAINER_START event and other miscellaneous issues Sub-task Resolved Varun Saxena Actions
        130.
        Avoid re-creation of EventColumnNameConverter in HBaseTimelineWriterImpl#storeEvents Sub-task Resolved Joep Rottinghuis Actions
        131.
        Eliminate unused imports checkstyle warnings Sub-task Resolved Joep Rottinghuis Actions
        132.
        fix several rebase and other miscellaneous issues before merge Sub-task Resolved Sangjin Lee Actions
        133.
        Store node information for finished containers in timeline v2 Sub-task Resolved Unassigned Actions
        134.
        fix hadoop-aws pom not to do the exclusion Sub-task Resolved Sangjin Lee Actions

          Activity

            People

            • Assignee:
              sjlee0 Sangjin Lee
              Reporter:
              sjlee0 Sangjin Lee

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment