Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha1
    • Component/s: timelineserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      We are introducing an early preview (alpha 1) of a major revision of YARN Timeline Service: v.2. YARN Timeline Service v.2 addresses two major challenges: improving scalability and reliability of Timeline Service, and enhancing usability by introducing flows and aggregation.

      YARN Timeline Service v.2 alpha 1 is provided so that users and developers can test it and provide feedback and suggestions for making it a ready replacement for Timeline Service v.1.x. It should be used only in a test capacity. Most importantly, security is not enabled. Do not set up or use Timeline Service v.2 until security is implemented if security is a critical requirement.

      More details are available in the [YARN Timeline Service v.2](./hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html) documentation.
      Show
      We are introducing an early preview (alpha 1) of a major revision of YARN Timeline Service: v.2. YARN Timeline Service v.2 addresses two major challenges: improving scalability and reliability of Timeline Service, and enhancing usability by introducing flows and aggregation. YARN Timeline Service v.2 alpha 1 is provided so that users and developers can test it and provide feedback and suggestions for making it a ready replacement for Timeline Service v.1.x. It should be used only in a test capacity. Most importantly, security is not enabled. Do not set up or use Timeline Service v.2 until security is implemented if security is a critical requirement. More details are available in the [YARN Timeline Service v.2](./hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html) documentation.

      Description

      We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed.

      This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort.

        Attachments

        1. YARN-2928.03.patch
          2.37 MB
          Sangjin Lee
        2. YARN-2928.02.patch
          2.37 MB
          Sangjin Lee
        3. YARN-2928.01.patch
          2.36 MB
          Sangjin Lee
        4. TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf
          179 kB
          Vrushali C
        5. Timeline Service Next Gen - Planning - ppt.pptx
          345 kB
          Vinod Kumar Vavilapalli
        6. timeline_service_v2_next_milestones.pdf
          129 kB
          Sangjin Lee
        7. The YARN Timeline Service v.2 Documentation.pdf
          743 kB
          Sangjin Lee
        8. Data model proposal v1.pdf
          89 kB
          Zhijie Shen
        9. ATSv2BackendHBaseSchemaproposal.pdf
          259 kB
          Sangjin Lee
        10. ATSv2.rev2.pdf
          252 kB
          Sangjin Lee
        11. ATSv2.rev1.pdf
          249 kB
          Sangjin Lee

          Issue Links

          1.
          [Collector wireup] Set up timeline collector with basic request serving structure and lifecycle Sub-task Resolved Sangjin Lee
          2.
          [Data Model] create overall data objects of TS next gen Sub-task Resolved Zhijie Shen
          3.
          [Collector wireup] Implement RM starting its timeline collector Sub-task Resolved Naganarasimha G R
          4.
          [Storage implementation] Create a test-only backing storage implementation for ATS writes Sub-task Resolved Sangjin Lee
          5.
          [Storage abstraction] Create backing storage write interface for timeline collectors Sub-task Resolved Vrushali C
          6.
          [Storage implementation] Create standalone HBase backing storage implementation for ATS writes Sub-task Resolved Zhijie Shen
          7.
          [Storage implementation] Create HBase cluster backing storage implementation for ATS writes Sub-task Resolved Vrushali C
          8.
          [Collector wireup] Implement timeline app-level collector service discovery Sub-task Resolved Junping Du
          9.
          [Data Model] Make putEntities operation be aware of the app's context Sub-task Resolved Zhijie Shen
          10.
          [Data Model] Create ATS metrics API Sub-task Resolved Unassigned
          11.
          [Data Model] Create ATS configuration, metadata, etc. as part of entities Sub-task Resolved Unassigned
          12.
          [Event producers] Implement RM writing app lifecycle events to ATS Sub-task Resolved Naganarasimha G R
          13.
          [Event producers] Implement NM writing container lifecycle events to ATS Sub-task Resolved Naganarasimha G R
          14.
          [Data Serving] Set up ATS reader with basic request serving structure and lifecycle Sub-task Resolved Varun Saxena
          15.
          [Source organization] Refactor timeline collector according to new code organization Sub-task Resolved Li Lu
          16.
          [Data Serving] Handle how to set up and start/stop ATS reader instances Sub-task Resolved Varun Saxena
          17.
          [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend Sub-task Resolved Zhijie Shen
          18.
          [Storage abstraction] Create backing storage read interface for ATS readers Sub-task Resolved Varun Saxena
          19.
          [Data Serving] Provide a very simple POC html ATS UI Sub-task Resolved Sangjin Lee
          20.
          Bootstrap TimelineServer Next Gen Module Sub-task Resolved Zhijie Shen
          21.
          [Collector implementation] the REST server (web server) for per-node collector does not work if it runs inside node manager Sub-task Resolved Li Lu
          22.
          [Collector wireup] We need an assured way to determine if a container is an AM container on NM Sub-task Resolved Giovanni Matteo Fumarola
          23.
          [Event producers] Change distributed shell to use new timeline service Sub-task Resolved Junping Du
          24.
          [Storage implementation] Exploiting the option of using Phoenix to access HBase backend Sub-task Resolved Li Lu
          25.
          [Documentation] Documenting the timeline service v2 Sub-task Resolved Sangjin Lee
          26.
          [Collector implementation] Implement the core functionality of the timeline collector Sub-task Resolved Vrushali C
          27.
          [Data Mode] Implement client API to put generic entities Sub-task Resolved Zhijie Shen
          28.
          [Storage implementation] Create backing storage write interface and a POC only file based storage implementation Sub-task Resolved Vrushali C
          29.
          Refactor and fix null casting in some map cast for TimelineEntity (old and new) and fix findbug warnings Sub-task Resolved Junping Du
          30.
          rename TimelineAggregator etc. to TimelineCollector Sub-task Resolved Sangjin Lee
          31.
          [Event Producers] NM TimelineClient container metrics posting to new timeline service. Sub-task Resolved Junping Du
          32.
          Replace starting a separate thread for post entity with event loop in TimelineClient Sub-task Resolved Naganarasimha G R
          33.
          Collector's web server should randomly bind an available port Sub-task Resolved Zhijie Shen
          34.
          TestTimelineServiceClientIntegration fails Sub-task Resolved Sangjin Lee
          35.
          Reuse TimelineCollectorManager for RM Sub-task Resolved Zhijie Shen
          36.
          Clearly define flow ID/ flow run / flow version in API and storage Sub-task Resolved Zhijie Shen
          37.
          Security support for new timeline service. Sub-task Resolved Unassigned
          38.
          [Storage implementation] explore & create the native HBase schema for writes Sub-task Resolved Vrushali C
          39.
          Sub resources of timeline entity needs to be passed to a separate endpoint. Sub-task Resolved Zhijie Shen
          40.
          Cache runningApps in RMNode for getting running apps on given NodeId Sub-task Resolved Junping Du
          41.
          Consolidate flow name/version/run defaults Sub-task Resolved Sangjin Lee
          42.
          Add miniHBase cluster and Phoenix support to ATS v2 unit tests Sub-task Resolved Li Lu
          43.
          Consolidate data model change according to the backend implementation Sub-task Resolved Zhijie Shen
          44.
          unit tests failures and issues found from findbug from earlier ATS checkins Sub-task Resolved Naganarasimha G R
          45.
          HttpServer2 Max threads in TimelineCollectorManager should be more than 10 Sub-task Resolved Varun Saxena
          46.
          RM only get back addresses of Collectors that NM needs to know. Sub-task Resolved Junping Du
          47.
          Performance optimization using connection cache of Phoenix timeline writer Sub-task Resolved Li Lu
          48.
          TestMRTimelineEventHandling and TestApplication are broken Sub-task Resolved Sangjin Lee
          49.
          Decide if flow version should be part of row key or column Sub-task Resolved Unassigned
          50.
          Generalize native HBase writer for additional tables Sub-task Resolved Joep Rottinghuis
          51.
          build is broken on YARN-2928 branch due to possible dependency cycle Sub-task Resolved Li Lu
          52.
          Fix TestHBaseTimelineWriterImpl unit test failure by fixing it's test data Sub-task Resolved Vrushali C
          53.
          Test case failures in TestDistributedShell and some issue fixes related to ATSV2 Sub-task Resolved Naganarasimha G R
          54.
          [JDK-8][YARN-2928] Exclude jdk.tools from hbase-client and hbase-testing-util Sub-task Resolved Tsuyoshi Ozawa
          55.
          REST API implementation for getting raw entities in TimelineReader Sub-task Resolved Varun Saxena
          56.
          [Aggregation] App-level aggregation and accumulation for YARN system metrics Sub-task Resolved Li Lu
          57.
          add equals and hashCode to TimelineEntity and other classes in the data model Sub-task Resolved Li Lu
          58.
          Support for fetching specific configs and metrics based on prefixes Sub-task Resolved Varun Saxena
          59.
          Support complex filters in TimelineReader Sub-task Resolved Varun Saxena
          60.
          Implement support for querying single app and all apps for a flow run Sub-task Resolved Varun Saxena
          61.
          Add equals and hashCode to TimelineEntity Sub-task Resolved Li Lu
          62.
          Populate flow run data in the flow_run & flow activity tables Sub-task Resolved Vrushali C
          63.
          Refactor timelineservice.storage to add support to online and offline aggregation writers Sub-task Resolved Li Lu
          64.
          split the application table from the entity table Sub-task Resolved Sangjin Lee
          65.
          Bugs in HBaseTimelineWriterImpl Sub-task Resolved Vrushali C
          66.
          ensure timely flush of timeline writes Sub-task Resolved Sangjin Lee
          67.
          Fix new findbugs warnings in resourcemanager in YARN-2928 branch Sub-task Resolved Varun Saxena
          68.
          Rethink event column key issue Sub-task Resolved Vrushali C
          69.
          Change to use the AM flag in ContainerContext determine AM container Sub-task Resolved Sunil Govindan
          70.
          Some of the NM events are not getting published due race condition when AM container finishes in NM Sub-task Resolved Naganarasimha G R
          71.
          Change the way metric values are stored in HBase Storage Sub-task Resolved Varun Saxena
          72.
          Publisher V2 should write the unmanaged AM flag and application priority Sub-task Resolved Sunil Govindan
          73.
          Deal with byte representations of Longs in writer code Sub-task Resolved Sangjin Lee
          74.
          Miscellaneous issues in NodeManager project Sub-task Resolved Naganarasimha G R
          75.
          Add the flush and compaction functionality via coprocessors and scanners for flow run table Sub-task Resolved Vrushali C
          76.
          Populate the flow activity table Sub-task Resolved Vrushali C
          77.
          build is broken at TestHBaseTimelineWriterImpl.java Sub-task Resolved Sangjin Lee
          78.
          Support appUpdated event in TimelineV2 to publish details for movetoqueue, change in priority Sub-task Resolved Sunil Govindan
          79.
          [timeline reader] implement support for querying for flows and flow runs Sub-task Resolved Sangjin Lee
          80.
          [reader REST API] implement support for querying for flows and flow runs Sub-task Resolved Varun Saxena
          81.
          Add a "skip existing table" mode for timeline schema creator Sub-task Resolved Li Lu
          82.
          Refactor the SystemMetricPublisher in RM to better support newer events Sub-task Resolved Naganarasimha G R
          83.
          Fix javadoc warnings floating up from hbase Sub-task Resolved Sangjin Lee
          84.
          [storage implementation] app id as string in row keys can cause incorrect ordering Sub-task Resolved Varun Saxena
          85.
          [reader implementation] support flow activity queries based on time Sub-task Resolved Varun Saxena
          86.
          Refactor reader classes in storage to nest under hbase specific package name Sub-task Resolved Li Lu
          87.
          Add request/response logging & timing for each REST endpoint call Sub-task Resolved Varun Saxena
          88.
          HBase reader throws NPE if Get returns no rows Sub-task Resolved Varun Saxena
          89.
          Store user in app to flow table Sub-task Resolved Varun Saxena
          90.
          Support fetching entities by UID and change the REST interface to conform to current REST APIs' in YARN Sub-task Resolved Varun Saxena
          91.
          Support additional queries for ATSv2 Web UI Sub-task Resolved Varun Saxena
          92.
          correctly set createdTime and remove modifiedTime when publishing entities Sub-task Resolved Varun Saxena
          93.
          TestJobHistoryEventHandler and TestRMContainerAllocator failing on YARN-2928 branch Sub-task Resolved Varun Saxena
          94.
          TestDistributedShell fails for V2 scenarios Sub-task Resolved Naganarasimha G R
          95.
          ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off Sub-task Resolved Sangjin Lee
          96.
          Fix javadoc and checkstyle issues in timelineservice code Sub-task Resolved Varun Saxena
          97.
          Unify the term flowId and flowName in timeline v2 codebase Sub-task Resolved Zhan Zhang
          98.
          Refactor reader API for better extensibility Sub-task Resolved Varun Saxena
          99.
          Provide a mechanism to represent complex filters and parse them at the REST layer Sub-task Resolved Varun Saxena
          100.
          TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail Sub-task Resolved Sangjin Lee
          101.
          [Bug fix] RM fails to start when SMP is enabled Sub-task Resolved Li Lu
          102.
          TestDistributedShell fails for v2 test cases after modifications for 1.5 Sub-task Resolved Naganarasimha G R
          103.
          TestRMRestart fails and findbugs issue in YARN-2928 branch Sub-task Resolved Varun Saxena
          104.
          New findbugs warning in resourcemanager in YARN-2928 branch Sub-task Resolved Varun Saxena
          105.
          ATS storage has one extra record each time the RM got restarted Sub-task Resolved Naganarasimha G R
          106.
          NM is going down with NPE's due to single thread processing of events by Timeline client Sub-task Resolved Naganarasimha G R
          107.
          CPU Usage Metric is not captured properly in YARN-2928 Sub-task Resolved Naganarasimha G R
          108.
          Add a check in the coprocessor for table to operated on Sub-task Resolved Vrushali C
          109.
          Ensure non-metric values are returned as is for flow run table from the coprocessor Sub-task Resolved Vrushali C
          110.
          Online aggregation logic should not run immediately after collectors got started Sub-task Resolved Li Lu
          111.
          hbase unit tests fail due to dependency issues Sub-task Resolved Sangjin Lee
          112.
          Code cleanup for TestDistributedShell Sub-task Resolved Li Lu
          113.
          [Documentation] Update timeline service v2 documentation to capture information about filters Sub-task Resolved Varun Saxena
          114.
          upgrade HBase version for first merge Sub-task Resolved Vrushali C
          115.
          created time shows 0 in most REST output Sub-task Resolved Varun Saxena
          116.
          flow activities and flow runs are populated with wrong timestamp when RM restarts w/ recovery enabled Sub-task Resolved Varun Saxena
          117.
          timelinereader has a lot of logging that's not useful Sub-task Resolved Sangjin Lee
          118.
          NPE in Separator.joinEncoded() Sub-task Resolved Vrushali C
          119.
          timeline service build fails with java 8 Sub-task Resolved Sangjin Lee
          120.
          entire time series is returned for YARN container system metrics (CPU and memory) Sub-task Resolved Varun Saxena
          121.
          timestamps are stored unencoded causing parse errors Sub-task Resolved Varun Saxena
          122.
          YARN container system metrics are not aggregated to application Sub-task Resolved Naganarasimha G R
          123.
          fix "no findbugs output file" error for hadoop-yarn-server-timelineservice-hbase-tests Sub-task Resolved Vrushali C
          124.
          fix findbugs warnings/errors for hadoop-yarn-server-timelineservice-hbase-tests Sub-task Resolved Vrushali C
          125.
          Escaping occurences of encodedValues Sub-task Resolved Sangjin Lee
          126.
          Eliminate singleton converters and static method access Sub-task Resolved Joep Rottinghuis
          127.
          [documentation] several updates/corrections to timeline service documentation Sub-task Resolved Sangjin Lee
          128.
          Make HBaseTimeline[Reader|Writer]Impl default and move FileSystemTimeline*Impl Sub-task Resolved Joep Rottinghuis
          129.
          NPE in Distributed Shell while publishing DS_CONTAINER_START event and other miscellaneous issues Sub-task Resolved Varun Saxena
          130.
          Avoid re-creation of EventColumnNameConverter in HBaseTimelineWriterImpl#storeEvents Sub-task Resolved Joep Rottinghuis
          131.
          Eliminate unused imports checkstyle warnings Sub-task Resolved Joep Rottinghuis
          132.
          fix several rebase and other miscellaneous issues before merge Sub-task Resolved Sangjin Lee
          133.
          Store node information for finished containers in timeline v2 Sub-task Resolved Unassigned
          134.
          fix hadoop-aws pom not to do the exclusion Sub-task Resolved Sangjin Lee

            Activity

              People

              • Assignee:
                sjlee0 Sangjin Lee
                Reporter:
                sjlee0 Sangjin Lee
              • Votes:
                1 Vote for this issue
                Watchers:
                92 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: