Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha1
    • Component/s: timelineserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      We are introducing an early preview (alpha 1) of a major revision of YARN Timeline Service: v.2. YARN Timeline Service v.2 addresses two major challenges: improving scalability and reliability of Timeline Service, and enhancing usability by introducing flows and aggregation.

      YARN Timeline Service v.2 alpha 1 is provided so that users and developers can test it and provide feedback and suggestions for making it a ready replacement for Timeline Service v.1.x. It should be used only in a test capacity. Most importantly, security is not enabled. Do not set up or use Timeline Service v.2 until security is implemented if security is a critical requirement.

      More details are available in the [YARN Timeline Service v.2](./hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html) documentation.
      Show
      We are introducing an early preview (alpha 1) of a major revision of YARN Timeline Service: v.2. YARN Timeline Service v.2 addresses two major challenges: improving scalability and reliability of Timeline Service, and enhancing usability by introducing flows and aggregation. YARN Timeline Service v.2 alpha 1 is provided so that users and developers can test it and provide feedback and suggestions for making it a ready replacement for Timeline Service v.1.x. It should be used only in a test capacity. Most importantly, security is not enabled. Do not set up or use Timeline Service v.2 until security is implemented if security is a critical requirement. More details are available in the [YARN Timeline Service v.2](./hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html) documentation.

      Description

      We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed.

      This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort.

        Attachments

        1. ATSv2.rev1.pdf
          249 kB
          Sangjin Lee
        2. ATSv2.rev2.pdf
          252 kB
          Sangjin Lee
        3. Data model proposal v1.pdf
          89 kB
          Zhijie Shen
        4. Timeline Service Next Gen - Planning - ppt.pptx
          345 kB
          Vinod Kumar Vavilapalli
        5. TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf
          179 kB
          Vrushali C
        6. ATSv2BackendHBaseSchemaproposal.pdf
          259 kB
          Sangjin Lee
        7. timeline_service_v2_next_milestones.pdf
          129 kB
          Sangjin Lee
        8. The YARN Timeline Service v.2 Documentation.pdf
          743 kB
          Sangjin Lee
        9. YARN-2928.01.patch
          2.36 MB
          Sangjin Lee
        10. YARN-2928.02.patch
          2.37 MB
          Sangjin Lee
        11. YARN-2928.03.patch
          2.37 MB
          Sangjin Lee

          Issue Links

          1.
          [Collector wireup] Set up timeline collector with basic request serving structure and lifecycle Sub-task Resolved Sangjin Lee
          2.
          [Data Model] create overall data objects of TS next gen Sub-task Resolved Zhijie Shen
          3.
          [Collector wireup] Implement RM starting its timeline collector Sub-task Resolved Naganarasimha G R
          4.
          [Storage abstraction] Create backing storage write interface for timeline collectors Sub-task Resolved Vrushali C
          5.
          [Storage implementation] Create a test-only backing storage implementation for ATS writes Sub-task Resolved Sangjin Lee
          6.
          [Storage implementation] Create standalone HBase backing storage implementation for ATS writes Sub-task Resolved Zhijie Shen
          7.
          [Storage implementation] Create HBase cluster backing storage implementation for ATS writes Sub-task Resolved Vrushali C
          8.
          [Collector wireup] Implement timeline app-level collector service discovery Sub-task Resolved Junping Du
          9.
          [Data Model] Make putEntities operation be aware of the app's context Sub-task Resolved Zhijie Shen
          10.
          [Data Model] Create ATS metrics API Sub-task Resolved Unassigned
          11.
          [Data Model] Create ATS configuration, metadata, etc. as part of entities Sub-task Resolved Unassigned
          12.
          [Event producers] Implement RM writing app lifecycle events to ATS Sub-task Resolved Naganarasimha G R
          13.
          [Event producers] Implement NM writing container lifecycle events to ATS Sub-task Resolved Naganarasimha G R
          14.
          [Data Serving] Set up ATS reader with basic request serving structure and lifecycle Sub-task Resolved Varun Saxena
          15.
          [Data Serving] Handle how to set up and start/stop ATS reader instances Sub-task Resolved Varun Saxena
          16.
          [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend Sub-task Resolved Zhijie Shen
          17.
          [Storage abstraction] Create backing storage read interface for ATS readers Sub-task Resolved Varun Saxena
          18.
          [Data Serving] Provide a very simple POC html ATS UI Sub-task Resolved Sangjin Lee
          19.
          Bootstrap TimelineServer Next Gen Module Sub-task Resolved Zhijie Shen
          20.
          [Collector implementation] the REST server (web server) for per-node collector does not work if it runs inside node manager Sub-task Resolved Li Lu
          21.
          [Collector wireup] We need an assured way to determine if a container is an AM container on NM Sub-task Resolved Giovanni Matteo Fumarola
          22.
          [Event producers] Change distributed shell to use new timeline service Sub-task Resolved Junping Du
          23.
          [Storage implementation] Exploiting the option of using Phoenix to access HBase backend Sub-task Resolved Li Lu
          24.
          [Documentation] Documenting the timeline service v2 Sub-task Resolved Sangjin Lee
          25.
          [Collector implementation] Implement the core functionality of the timeline collector Sub-task Resolved Vrushali C
          26.
          [Source organization] Refactor timeline collector according to new code organization Sub-task Resolved Li Lu
          27.
          [Data Mode] Implement client API to put generic entities Sub-task Resolved Zhijie Shen
          28.
          [Storage implementation] Create backing storage write interface and a POC only file based storage implementation Sub-task Resolved Vrushali C
          29.
          Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings Sub-task Resolved Junping Du
          30.
          rename TimelineAggregator etc. to TimelineCollector Sub-task Resolved Sangjin Lee
          31.
          [Event Producers] NM TimelineClient container metrics posting to new timeline service. Sub-task Resolved Junping Du
          32.
          Replace starting a separate thread for post entity with event loop in TimelineClient Sub-task Resolved Naganarasimha G R
          33.
          Collector's web server should randomly bind an available port Sub-task Resolved Zhijie Shen
          34.
          TestTimelineServiceClientIntegration fails Sub-task Resolved Sangjin Lee
          35.
          Reuse TimelineCollectorManager for RM Sub-task Resolved Zhijie Shen
          36.
          Clearly define flow ID/ flow run / flow version in API and storage Sub-task Resolved Zhijie Shen
          37.
          Security support for new timeline service. Sub-task Resolved Unassigned
          38.
          [Storage implementation] explore & create the native HBase schema for writes Sub-task Resolved Vrushali C
          39.
          Sub resources of timeline entity needs to be passed to a separate endpoint. Sub-task Resolved Zhijie Shen
          40.
          Cache runningApps in RMNode for getting running apps on given NodeId Sub-task Resolved Junping Du
          41.
          Consolidate flow name/version/run defaults Sub-task Resolved Sangjin Lee
          42.
          Add miniHBase cluster and Phoenix support to ATS v2 unit tests Sub-task Resolved Li Lu
          43.
          Consolidate data model change according to the backend implementation Sub-task Resolved Zhijie Shen
          44.
          unit tests failures and issues found from findbug from earlier ATS checkins Sub-task Resolved Naganarasimha G R
          45.
          HttpServer2 Max threads in TimelineCollectorManager should be more than 10 Sub-task Resolved Varun Saxena
          46.
          RM only get back addresses of Collectors that NM needs to know. Sub-task Resolved Junping Du
          47.
          Performance optimization using connection cache of Phoenix timeline writer Sub-task Resolved Li Lu
          48.
          TestMRTimelineEventHandling and TestApplication are broken Sub-task Resolved Sangjin Lee
          49.
          Decide if flow version should be part of row key or column Sub-task Resolved Unassigned
          50.
          Generalize native HBase writer for additional tables Sub-task Resolved Joep Rottinghuis
          51.
          build is broken on YARN-2928 branch due to possible dependency cycle Sub-task Resolved Li Lu
          52.
          Fix TestHBaseTimelineWriterImpl unit test failure by fixing it's test data Sub-task Resolved Vrushali C
          53.
          Test case failures in TestDistributedShell and some issue fixes related to ATSV2 Sub-task Resolved Naganarasimha G R
          54.
          [JDK-8][YARN-2928] Exclude jdk.tools from hbase-client and hbase-testing-util Sub-task Resolved Tsuyoshi Ozawa
          55.
          REST API implementation for getting raw entities in TimelineReader Sub-task Resolved Varun Saxena
          56.
          [Aggregation] App-level aggregation and accumulation for YARN system metrics Sub-task Resolved Li Lu
          57.
          add equals and hashCode to TimelineEntity and other classes in the data model Sub-task Resolved Li Lu
          58.
          Support for fetching specific configs and metrics based on prefixes Sub-task Resolved Varun Saxena
          59.
          Support complex filters in TimelineReader Sub-task Resolved Varun Saxena
          60.
          Implement support for querying single app and all apps for a flow run Sub-task Resolved Varun Saxena
          61.
          Add equals and hashCode to TimelineEntity Sub-task Resolved Li Lu
          62.
          Populate flow run data in the flow_run & flow activity tables Sub-task Resolved Vrushali C
          63.
          Refactor timelineservice.storage to add support to online and offline aggregation writers Sub-task Resolved Li Lu
          64.
          split the application table from the entity table Sub-task Resolved Sangjin Lee
          65.
          Bugs in HBaseTimelineWriterImpl Sub-task Resolved Vrushali C
          66.
          ensure timely flush of timeline writes Sub-task Resolved Sangjin Lee
          67.
          Fix new findbugs warnings in resourcemanager in YARN-2928 branch Sub-task Resolved Varun Saxena
          68.
          Rethink event column key issue Sub-task Resolved Vrushali C
          69.
          Change to use the AM flag in ContainerContext determine AM container Sub-task Resolved Sunil G
          70.
          Some of the NM events are not getting published due race condition when AM container finishes in NM Sub-task Resolved Naganarasimha G R
          71.
          Change the way metric values are stored in HBase Storage Sub-task Resolved Varun Saxena
          72.
          Publisher V2 should write the unmanaged AM flag and application priority Sub-task Resolved Sunil G
          73.
          Deal with byte representations of Longs in writer code Sub-task Resolved Sangjin Lee
          74.
          Miscellaneous issues in NodeManager project Sub-task Resolved Naganarasimha G R
          75.
          Add the flush and compaction functionality via coprocessors and scanners for flow run table Sub-task Resolved Vrushali C
          76.
          Populate the flow activity table Sub-task Resolved Vrushali C
          77.
          build is broken at TestHBaseTimelineWriterImpl.java Sub-task Resolved Sangjin Lee
          78.
          Support appUpdated event in TimelineV2 to publish details for movetoqueue, change in priority Sub-task Resolved Sunil G
          79.
          [timeline reader] implement support for querying for flows and flow runs Sub-task Resolved Sangjin Lee
          80.
          [reader REST API] implement support for querying for flows and flow runs Sub-task Resolved Varun Saxena
          81.
          Add a "skip existing table" mode for timeline schema creator Sub-task Resolved Li Lu
          82.
          Refactor the SystemMetricPublisher in RM to better support newer events Sub-task Resolved Naganarasimha G R
          83.
          Fix javadoc warnings floating up from hbase Sub-task Resolved Sangjin Lee
          84.
          [storage implementation] app id as string in row keys can cause incorrect ordering Sub-task Resolved Varun Saxena
          85.
          [reader implementation] support flow activity queries based on time Sub-task Resolved Varun Saxena
          86.
          Refactor reader classes in storage to nest under hbase specific package name Sub-task Resolved Li Lu
          87.
          Add request/response logging & timing for each REST endpoint call Sub-task Resolved Varun Saxena
          88.
          HBase reader throws NPE if Get returns no rows Sub-task Resolved Varun Saxena
          89.
          Store user in app to flow table Sub-task Resolved Varun Saxena
          90.
          Support fetching entities by UID and change the REST interface to conform to current REST APIs' in YARN Sub-task Resolved Varun Saxena
          91.
          Support additional queries for ATSv2 Web UI Sub-task Resolved Varun Saxena
          92.
          correctly set createdTime and remove modifiedTime when publishing entities Sub-task Resolved Varun Saxena
          93.
          TestJobHistoryEventHandler and TestRMContainerAllocator failing on YARN-2928 branch Sub-task Resolved Varun Saxena
          94.
          TestDistributedShell fails for V2 scenarios Sub-task Resolved Naganarasimha G R
          95.
          ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off Sub-task Resolved Sangjin Lee
          96.
          Fix javadoc and checkstyle issues in timelineservice code Sub-task Resolved Varun Saxena
          97.
          Unify the term flowId and flowName in timeline v2 codebase Sub-task Resolved Zhan Zhang
          98.
          Refactor reader API for better extensibility Sub-task Resolved Varun Saxena
          99.
          Provide a mechanism to represent complex filters and parse them at the REST layer Sub-task Resolved Varun Saxena
          100.
          TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail Sub-task Resolved Sangjin Lee
          101.
          [Bug fix] RM fails to start when SMP is enabled Sub-task Resolved Li Lu
          102.
          TestDistributedShell fails for v2 test cases after modifications for 1.5 Sub-task Resolved Naganarasimha G R
          103.
          TestRMRestart fails and findbugs issue in YARN-2928 branch Sub-task Resolved Varun Saxena
          104.
          New findbugs warning in resourcemanager in YARN-2928 branch Sub-task Resolved Varun Saxena
          105.
          ATS storage has one extra record each time the RM got restarted Sub-task Resolved Naganarasimha G R
          106.
          NM is going down with NPE's due to single thread processing of events by Timeline client Sub-task Resolved Naganarasimha G R
          107.
          CPU Usage Metric is not captured properly in YARN-2928 Sub-task Resolved Naganarasimha G R
          108.
          Add a check in the coprocessor for table to operated on Sub-task Resolved Vrushali C
          109.
          Ensure non-metric values are returned as is for flow run table from the coprocessor Sub-task Resolved Vrushali C
          110.
          Online aggregation logic should not run immediately after collectors got started Sub-task Resolved Li Lu
          111.
          hbase unit tests fail due to dependency issues Sub-task Resolved Sangjin Lee
          112.
          Code cleanup for TestDistributedShell Sub-task Resolved Li Lu
          113.
          [Documentation] Update timeline service v2 documentation to capture information about filters Sub-task Resolved Varun Saxena
          114.
          upgrade HBase version for first merge Sub-task Resolved Vrushali C
          115.
          created time shows 0 in most REST output Sub-task Resolved Varun Saxena
          116.
          flow activities and flow runs are populated with wrong timestamp when RM restarts w/ recovery enabled Sub-task Resolved Varun Saxena
          117.
          timelinereader has a lot of logging that's not useful Sub-task Resolved Sangjin Lee
          118.
          NPE in Separator.joinEncoded() Sub-task Resolved Vrushali C
          119.
          timeline service build fails with java 8 Sub-task Resolved Sangjin Lee
          120.
          entire time series is returned for YARN container system metrics (CPU and memory) Sub-task Resolved Varun Saxena
          121.
          timestamps are stored unencoded causing parse errors Sub-task Resolved Varun Saxena
          122.
          YARN container system metrics are not aggregated to application Sub-task Resolved Naganarasimha G R
          123.
          fix "no findbugs output file" error for hadoop-yarn-server-timelineservice-hbase-tests Sub-task Resolved Vrushali C
          124.
          fix findbugs warnings/errors for hadoop-yarn-server-timelineservice-hbase-tests Sub-task Resolved Vrushali C
          125.
          Escaping occurences of encodedValues Sub-task Resolved Sangjin Lee
          126.
          Eliminate singleton converters and static method access Sub-task Resolved Joep Rottinghuis
          127.
          [documentation] several updates/corrections to timeline service documentation Sub-task Resolved Sangjin Lee
          128.
          Make HBaseTimeline[Reader|Writer]Impl default and move FileSystemTimeline*Impl Sub-task Resolved Joep Rottinghuis
          129.
          NPE in Distributed Shell while publishing DS_CONTAINER_START event and other miscellaneous issues Sub-task Resolved Varun Saxena
          130.
          Avoid re-creation of EventColumnNameConverter in HBaseTimelineWriterImpl#storeEvents Sub-task Resolved Joep Rottinghuis
          131.
          Eliminate unused imports checkstyle warnings Sub-task Resolved Joep Rottinghuis
          132.
          fix several rebase and other miscellaneous issues before merge Sub-task Resolved Sangjin Lee
          133.
          Store node information for finished containers in timeline v2 Sub-task Resolved Unassigned
          134.
          fix hadoop-aws pom not to do the exclusion Sub-task Resolved Sangjin Lee

            Activity

              People

              • Assignee:
                sjlee0 Sangjin Lee
                Reporter:
                sjlee0 Sangjin Lee
              • Votes:
                1 Vote for this issue
                Watchers:
                90 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: