Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2928

YARN Timeline Service v.2: alpha 1

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha1
    • Component/s: timelineserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      We are introducing an early preview (alpha 1) of a major revision of YARN Timeline Service: v.2. YARN Timeline Service v.2 addresses two major challenges: improving scalability and reliability of Timeline Service, and enhancing usability by introducing flows and aggregation.

      YARN Timeline Service v.2 alpha 1 is provided so that users and developers can test it and provide feedback and suggestions for making it a ready replacement for Timeline Service v.1.x. It should be used only in a test capacity. Most importantly, security is not enabled. Do not set up or use Timeline Service v.2 until security is implemented if security is a critical requirement.

      More details are available in the [YARN Timeline Service v.2](./hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html) documentation.
      Show
      We are introducing an early preview (alpha 1) of a major revision of YARN Timeline Service: v.2. YARN Timeline Service v.2 addresses two major challenges: improving scalability and reliability of Timeline Service, and enhancing usability by introducing flows and aggregation. YARN Timeline Service v.2 alpha 1 is provided so that users and developers can test it and provide feedback and suggestions for making it a ready replacement for Timeline Service v.1.x. It should be used only in a test capacity. Most importantly, security is not enabled. Do not set up or use Timeline Service v.2 until security is implemented if security is a critical requirement. More details are available in the [YARN Timeline Service v.2](./hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html) documentation.

      Description

      We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed.

      This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort.

        Attachments

        1. ATSv2.rev1.pdf
          249 kB
          Sangjin Lee
        2. ATSv2.rev2.pdf
          252 kB
          Sangjin Lee
        3. ATSv2BackendHBaseSchemaproposal.pdf
          259 kB
          Sangjin Lee
        4. Data model proposal v1.pdf
          89 kB
          Zhijie Shen
        5. The YARN Timeline Service v.2 Documentation.pdf
          743 kB
          Sangjin Lee
        6. timeline_service_v2_next_milestones.pdf
          129 kB
          Sangjin Lee
        7. Timeline Service Next Gen - Planning - ppt.pptx
          345 kB
          Vinod Kumar Vavilapalli
        8. TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf
          179 kB
          Vrushali C
        9. YARN-2928.01.patch
          2.36 MB
          Sangjin Lee
        10. YARN-2928.02.patch
          2.37 MB
          Sangjin Lee
        11. YARN-2928.03.patch
          2.37 MB
          Sangjin Lee

        Issue Links

        1.
        [Collector wireup] Set up timeline collector with basic request serving structure and lifecycle Sub-task Resolved Sangjin Lee Actions
        2.
        [Data Model] create overall data objects of TS next gen Sub-task Resolved Zhijie Shen Actions
        3.
        [Collector wireup] Implement RM starting its timeline collector Sub-task Resolved Naganarasimha G R Actions
        4.
        [Storage implementation] Create a test-only backing storage implementation for ATS writes Sub-task Resolved Sangjin Lee Actions
        5.
        [Storage abstraction] Create backing storage write interface for timeline collectors Sub-task Resolved Vrushali C Actions
        6.
        [Storage implementation] Create standalone HBase backing storage implementation for ATS writes Sub-task Resolved Zhijie Shen Actions
        7.
        [Storage implementation] Create HBase cluster backing storage implementation for ATS writes Sub-task Resolved Vrushali C Actions
        8.
        [Collector wireup] Implement timeline app-level collector service discovery Sub-task Resolved Junping Du Actions
        9.
        [Data Model] Make putEntities operation be aware of the app's context Sub-task Resolved Zhijie Shen Actions
        10.
        [Data Model] Create ATS metrics API Sub-task Resolved Unassigned Actions
        11.
        [Data Model] Create ATS configuration, metadata, etc. as part of entities Sub-task Resolved Unassigned Actions
        12.
        [Event producers] Implement RM writing app lifecycle events to ATS Sub-task Resolved Naganarasimha G R Actions
        13.
        [Event producers] Implement NM writing container lifecycle events to ATS Sub-task Resolved Naganarasimha G R Actions
        14.
        [Data Serving] Set up ATS reader with basic request serving structure and lifecycle Sub-task Resolved Varun Saxena Actions
        15.
        [Source organization] Refactor timeline collector according to new code organization Sub-task Resolved Li Lu Actions
        16.
        [Data Serving] Handle how to set up and start/stop ATS reader instances Sub-task Resolved Varun Saxena Actions
        17.
        [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend Sub-task Resolved Zhijie Shen Actions
        18.
        [Storage abstraction] Create backing storage read interface for ATS readers Sub-task Resolved Varun Saxena Actions
        19.
        [Data Serving] Provide a very simple POC html ATS UI Sub-task Resolved Sangjin Lee Actions
        20.
        Bootstrap TimelineServer Next Gen Module Sub-task Resolved Zhijie Shen Actions
        21.
        [Collector implementation] the REST server (web server) for per-node collector does not work if it runs inside node manager Sub-task Resolved Li Lu Actions
        22.
        [Collector wireup] We need an assured way to determine if a container is an AM container on NM Sub-task Resolved Giovanni Matteo Fumarola Actions
        23.
        [Event producers] Change distributed shell to use new timeline service Sub-task Resolved Junping Du Actions
        24.
        [Storage implementation] Exploiting the option of using Phoenix to access HBase backend Sub-task Resolved Li Lu Actions
        25.
        [Documentation] Documenting the timeline service v2 Sub-task Resolved Sangjin Lee Actions
        26.
        [Collector implementation] Implement the core functionality of the timeline collector Sub-task Resolved Vrushali C Actions
        27.
        [Data Mode] Implement client API to put generic entities Sub-task Resolved Zhijie Shen Actions
        28.
        [Storage implementation] Create backing storage write interface and a POC only file based storage implementation Sub-task Resolved Vrushali C Actions
        29.
        Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings Sub-task Resolved Junping Du Actions
        30.
        rename TimelineAggregator etc. to TimelineCollector Sub-task Resolved Sangjin Lee Actions
        31.
        [Event Producers] NM TimelineClient container metrics posting to new timeline service. Sub-task Resolved Junping Du Actions
        32.
        Replace starting a separate thread for post entity with event loop in TimelineClient Sub-task Resolved Naganarasimha G R Actions
        33.
        Collector's web server should randomly bind an available port Sub-task Resolved Zhijie Shen Actions
        34.
        TestTimelineServiceClientIntegration fails Sub-task Resolved Sangjin Lee Actions
        35.
        Reuse TimelineCollectorManager for RM Sub-task Resolved Zhijie Shen Actions
        36.
        Clearly define flow ID/ flow run / flow version in API and storage Sub-task Resolved Zhijie Shen Actions
        37.
        Security support for new timeline service. Sub-task Resolved Unassigned Actions
        38.
        [Storage implementation] explore & create the native HBase schema for writes Sub-task Resolved Vrushali C Actions
        39.
        Sub resources of timeline entity needs to be passed to a separate endpoint. Sub-task Resolved Zhijie Shen Actions
        40.
        Cache runningApps in RMNode for getting running apps on given NodeId Sub-task Resolved Junping Du Actions
        41.
        Consolidate flow name/version/run defaults Sub-task Resolved Sangjin Lee Actions
        42.
        Add miniHBase cluster and Phoenix support to ATS v2 unit tests Sub-task Resolved Li Lu Actions
        43.
        Consolidate data model change according to the backend implementation Sub-task Resolved Zhijie Shen Actions
        44.
        unit tests failures and issues found from findbug from earlier ATS checkins Sub-task Resolved Naganarasimha G R Actions
        45.
        HttpServer2 Max threads in TimelineCollectorManager should be more than 10 Sub-task Resolved Varun Saxena Actions
        46.
        RM only get back addresses of Collectors that NM needs to know. Sub-task Resolved Junping Du Actions
        47.
        Performance optimization using connection cache of Phoenix timeline writer Sub-task Resolved Li Lu Actions
        48.
        TestMRTimelineEventHandling and TestApplication are broken Sub-task Resolved Sangjin Lee Actions
        49.
        Decide if flow version should be part of row key or column Sub-task Resolved Unassigned Actions
        50.
        Generalize native HBase writer for additional tables Sub-task Resolved Joep Rottinghuis Actions
        51.
        build is broken on YARN-2928 branch due to possible dependency cycle Sub-task Resolved Li Lu Actions
        52.
        Fix TestHBaseTimelineWriterImpl unit test failure by fixing it's test data Sub-task Resolved Vrushali C Actions
        53.
        Test case failures in TestDistributedShell and some issue fixes related to ATSV2 Sub-task Resolved Naganarasimha G R Actions
        54.
        [JDK-8][YARN-2928] Exclude jdk.tools from hbase-client and hbase-testing-util Sub-task Resolved Tsuyoshi Ozawa Actions
        55.
        REST API implementation for getting raw entities in TimelineReader Sub-task Resolved Varun Saxena Actions
        56.
        [Aggregation] App-level aggregation and accumulation for YARN system metrics Sub-task Resolved Li Lu Actions
        57.
        add equals and hashCode to TimelineEntity and other classes in the data model Sub-task Resolved Li Lu Actions
        58.
        Support for fetching specific configs and metrics based on prefixes Sub-task Resolved Varun Saxena Actions
        59.
        Support complex filters in TimelineReader Sub-task Resolved Varun Saxena Actions
        60.
        Implement support for querying single app and all apps for a flow run Sub-task Resolved Varun Saxena Actions
        61.
        Add equals and hashCode to TimelineEntity Sub-task Resolved Li Lu Actions
        62.
        Populate flow run data in the flow_run & flow activity tables Sub-task Resolved Vrushali C Actions
        63.
        Refactor timelineservice.storage to add support to online and offline aggregation writers Sub-task Resolved Li Lu Actions
        64.
        split the application table from the entity table Sub-task Resolved Sangjin Lee Actions
        65.
        Bugs in HBaseTimelineWriterImpl Sub-task Resolved Vrushali C Actions
        66.
        ensure timely flush of timeline writes Sub-task Resolved Sangjin Lee Actions
        67.
        Fix new findbugs warnings in resourcemanager in YARN-2928 branch Sub-task Resolved Varun Saxena Actions
        68.
        Rethink event column key issue Sub-task Resolved Vrushali C Actions
        69.
        Change to use the AM flag in ContainerContext determine AM container Sub-task Resolved Sunil G Actions
        70.
        Some of the NM events are not getting published due race condition when AM container finishes in NM Sub-task Resolved Naganarasimha G R Actions
        71.
        Change the way metric values are stored in HBase Storage Sub-task Resolved Varun Saxena Actions
        72.
        Publisher V2 should write the unmanaged AM flag and application priority Sub-task Resolved Sunil G Actions
        73.
        Deal with byte representations of Longs in writer code Sub-task Resolved Sangjin Lee Actions
        74.
        Miscellaneous issues in NodeManager project Sub-task Resolved Naganarasimha G R Actions
        75.
        Add the flush and compaction functionality via coprocessors and scanners for flow run table Sub-task Resolved Vrushali C Actions
        76.
        Populate the flow activity table Sub-task Resolved Vrushali C Actions
        77.
        build is broken at TestHBaseTimelineWriterImpl.java Sub-task Resolved Sangjin Lee Actions
        78.
        Support appUpdated event in TimelineV2 to publish details for movetoqueue, change in priority Sub-task Resolved Sunil G Actions
        79.
        [timeline reader] implement support for querying for flows and flow runs Sub-task Resolved Sangjin Lee Actions
        80.
        [reader REST API] implement support for querying for flows and flow runs Sub-task Resolved Varun Saxena Actions
        81.
        Add a "skip existing table" mode for timeline schema creator Sub-task Resolved Li Lu Actions
        82.
        Refactor the SystemMetricPublisher in RM to better support newer events Sub-task Resolved Naganarasimha G R Actions
        83.
        Fix javadoc warnings floating up from hbase Sub-task Resolved Sangjin Lee Actions
        84.
        [storage implementation] app id as string in row keys can cause incorrect ordering Sub-task Resolved Varun Saxena Actions
        85.
        [reader implementation] support flow activity queries based on time Sub-task Resolved Varun Saxena Actions
        86.
        Refactor reader classes in storage to nest under hbase specific package name Sub-task Resolved Li Lu Actions
        87.
        Add request/response logging & timing for each REST endpoint call Sub-task Resolved Varun Saxena Actions
        88.
        HBase reader throws NPE if Get returns no rows Sub-task Resolved Varun Saxena Actions
        89.
        Store user in app to flow table Sub-task Resolved Varun Saxena Actions
        90.
        Support fetching entities by UID and change the REST interface to conform to current REST APIs' in YARN Sub-task Resolved Varun Saxena Actions
        91.
        Support additional queries for ATSv2 Web UI Sub-task Resolved Varun Saxena Actions
        92.
        correctly set createdTime and remove modifiedTime when publishing entities Sub-task Resolved Varun Saxena Actions
        93.
        TestJobHistoryEventHandler and TestRMContainerAllocator failing on YARN-2928 branch Sub-task Resolved Varun Saxena Actions
        94.
        TestDistributedShell fails for V2 scenarios Sub-task Resolved Naganarasimha G R Actions
        95.
        ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off Sub-task Resolved Sangjin Lee Actions
        96.
        Fix javadoc and checkstyle issues in timelineservice code Sub-task Resolved Varun Saxena Actions
        97.
        Unify the term flowId and flowName in timeline v2 codebase Sub-task Resolved Zhan Zhang Actions
        98.
        Refactor reader API for better extensibility Sub-task Resolved Varun Saxena Actions
        99.
        Provide a mechanism to represent complex filters and parse them at the REST layer Sub-task Resolved Varun Saxena Actions
        100.
        TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail Sub-task Resolved Sangjin Lee Actions
        101.
        [Bug fix] RM fails to start when SMP is enabled Sub-task Resolved Li Lu Actions
        102.
        TestDistributedShell fails for v2 test cases after modifications for 1.5 Sub-task Resolved Naganarasimha G R Actions
        103.
        TestRMRestart fails and findbugs issue in YARN-2928 branch Sub-task Resolved Varun Saxena Actions
        104.
        New findbugs warning in resourcemanager in YARN-2928 branch Sub-task Resolved Varun Saxena Actions
        105.
        ATS storage has one extra record each time the RM got restarted Sub-task Resolved Naganarasimha G R Actions
        106.
        NM is going down with NPE's due to single thread processing of events by Timeline client Sub-task Resolved Naganarasimha G R Actions
        107.
        CPU Usage Metric is not captured properly in YARN-2928 Sub-task Resolved Naganarasimha G R Actions
        108.
        Add a check in the coprocessor for table to operated on Sub-task Resolved Vrushali C Actions
        109.
        Ensure non-metric values are returned as is for flow run table from the coprocessor Sub-task Resolved Vrushali C Actions
        110.
        Online aggregation logic should not run immediately after collectors got started Sub-task Resolved Li Lu Actions
        111.
        hbase unit tests fail due to dependency issues Sub-task Resolved Sangjin Lee Actions
        112.
        Code cleanup for TestDistributedShell Sub-task Resolved Li Lu Actions
        113.
        [Documentation] Update timeline service v2 documentation to capture information about filters Sub-task Resolved Varun Saxena Actions
        114.
        upgrade HBase version for first merge Sub-task Resolved Vrushali C Actions
        115.
        created time shows 0 in most REST output Sub-task Resolved Varun Saxena Actions
        116.
        flow activities and flow runs are populated with wrong timestamp when RM restarts w/ recovery enabled Sub-task Resolved Varun Saxena Actions
        117.
        timelinereader has a lot of logging that's not useful Sub-task Resolved Sangjin Lee Actions
        118.
        NPE in Separator.joinEncoded() Sub-task Resolved Vrushali C Actions
        119.
        timeline service build fails with java 8 Sub-task Resolved Sangjin Lee Actions
        120.
        entire time series is returned for YARN container system metrics (CPU and memory) Sub-task Resolved Varun Saxena Actions
        121.
        timestamps are stored unencoded causing parse errors Sub-task Resolved Varun Saxena Actions
        122.
        YARN container system metrics are not aggregated to application Sub-task Resolved Naganarasimha G R Actions
        123.
        fix "no findbugs output file" error for hadoop-yarn-server-timelineservice-hbase-tests Sub-task Resolved Vrushali C Actions
        124.
        fix findbugs warnings/errors for hadoop-yarn-server-timelineservice-hbase-tests Sub-task Resolved Vrushali C Actions
        125.
        Escaping occurences of encodedValues Sub-task Resolved Sangjin Lee Actions
        126.
        Eliminate singleton converters and static method access Sub-task Resolved Joep Rottinghuis Actions
        127.
        [documentation] several updates/corrections to timeline service documentation Sub-task Resolved Sangjin Lee Actions
        128.
        Make HBaseTimeline[Reader|Writer]Impl default and move FileSystemTimeline*Impl Sub-task Resolved Joep Rottinghuis Actions
        129.
        NPE in Distributed Shell while publishing DS_CONTAINER_START event and other miscellaneous issues Sub-task Resolved Varun Saxena Actions
        130.
        Avoid re-creation of EventColumnNameConverter in HBaseTimelineWriterImpl#storeEvents Sub-task Resolved Joep Rottinghuis Actions
        131.
        Eliminate unused imports checkstyle warnings Sub-task Resolved Joep Rottinghuis Actions
        132.
        fix several rebase and other miscellaneous issues before merge Sub-task Resolved Sangjin Lee Actions
        133.
        Store node information for finished containers in timeline v2 Sub-task Resolved Unassigned Actions
        134.
        fix hadoop-aws pom not to do the exclusion Sub-task Resolved Sangjin Lee Actions

          Activity

            People

            • Assignee:
              sjlee0 Sangjin Lee
              Reporter:
              sjlee0 Sangjin Lee

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment