Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1537

Add integration with Yarn's Application Timeline Server

    Details

    • Type: New Feature
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: YARN
    • Labels:
      None

      Description

      It would be nice to have Spark integrate with Yarn's Application Timeline Server (see YARN-321, YARN-1530). This would allow users running Spark on Yarn to have a single place to go for all their history needs, and avoid having to manage a separate service (Spark's built-in server).

      At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, although there is still some ongoing work. But the basics are there, and I wouldn't expect them to change (much) at this point.

      1. SPARK-1537.txt
        4 kB
        Zhan Zhang
      2. spark-1573.patch
        38 kB
        Zhan Zhang

        Issue Links

          Activity

          Hide
          vanzin Marcelo Vanzin added a comment -

          I'm working on this but this all sort of depends on progress being made on the Yarn side, so at this moment I'm not yet ready to send any PRs.

          Show
          vanzin Marcelo Vanzin added a comment - I'm working on this but this all sort of depends on progress being made on the Yarn side, so at this moment I'm not yet ready to send any PRs.
          Hide
          zhazhan Zhan Zhang added a comment -

          I am also interested in it and trying to integrate spark to yarn timeline server. Do you have any concrete plan in mind? I can start prototype it and then we can work together on this topic. How do you think?

          Show
          zhazhan Zhan Zhang added a comment - I am also interested in it and trying to integrate spark to yarn timeline server. Do you have any concrete plan in mind? I can start prototype it and then we can work together on this topic. How do you think?
          Hide
          vanzin Marcelo Vanzin added a comment -

          I have a prototype ready. But I'm still investigating some issues with the Yarn side of things (mostly around security and scalability). Given that I have some code pretty much ready, you're welcome to spend time on it, but you'd be duplicating work I already have done.

          Show
          vanzin Marcelo Vanzin added a comment - I have a prototype ready. But I'm still investigating some issues with the Yarn side of things (mostly around security and scalability). Given that I have some code pretty much ready, you're welcome to spend time on it, but you'd be duplicating work I already have done.
          Hide
          zhazhan Zhan Zhang added a comment -

          Do you mind sharing your thoughts, design document or prototype code?

          Thanks.

          Show
          zhazhan Zhan Zhang added a comment - Do you mind sharing your thoughts, design document or prototype code? Thanks.
          Hide
          vanzin Marcelo Vanzin added a comment -

          Currently busy with other more urgent tasks, but I'll push to my repo and post a link when I get some time.

          Show
          vanzin Marcelo Vanzin added a comment - Currently busy with other more urgent tasks, but I'll push to my repo and post a link when I get some time.
          Hide
          vanzin Marcelo Vanzin added a comment -

          Current code is here:
          https://github.com/vanzin/spark/tree/yarn-timeline

          Very much WIP at this point.

          Show
          vanzin Marcelo Vanzin added a comment - Current code is here: https://github.com/vanzin/spark/tree/yarn-timeline Very much WIP at this point.
          Hide
          zhazhan Zhan Zhang added a comment -

          Thanks for sharing this. Do you have concrete plan or timeline for this Jira?

          Show
          zhazhan Zhan Zhang added a comment - Thanks for sharing this. Do you have concrete plan or timeline for this Jira?
          Hide
          vanzin Marcelo Vanzin added a comment -

          No concrete timeline at the moment. I'm just starting to look at the 2.5.0 version of ATS so I can incorporate things into my patch.

          Show
          vanzin Marcelo Vanzin added a comment - No concrete timeline at the moment. I'm just starting to look at the 2.5.0 version of ATS so I can incorporate things into my patch.
          Hide
          zzhan Zhan Zhang added a comment -

          Do you have any update on this, or any schedule in your mind yet?

          Show
          zzhan Zhan Zhang added a comment - Do you have any update on this, or any schedule in your mind yet?
          Hide
          vanzin Marcelo Vanzin added a comment -

          No set schedule as of now. The current code "works", but it's blocked by at least one bug I filed against Yarn (YARN-2444).

          Also, I'm not comfortable with the current ATS design. There's discussion on YARN-1530 about making it better and I want to wait until that work at least starts, in case it causes changes in the API. While it's possible to submit the code without the Yarn changes in, I'm loth to add support for something that just isn't production-ready yet.

          Show
          vanzin Marcelo Vanzin added a comment - No set schedule as of now. The current code "works", but it's blocked by at least one bug I filed against Yarn ( YARN-2444 ). Also, I'm not comfortable with the current ATS design. There's discussion on YARN-1530 about making it better and I want to wait until that work at least starts, in case it causes changes in the API. While it's possible to submit the code without the Yarn changes in, I'm loth to add support for something that just isn't production-ready yet.
          Hide
          zzhan Zhan Zhang added a comment -

          Hi Marcelo,

          Do you have update on this? If you don't mind, I can work on your branch to get this done asap. Please let me know how do you think?

          Show
          zzhan Zhan Zhang added a comment - Hi Marcelo, Do you have update on this? If you don't mind, I can work on your branch to get this done asap. Please let me know how do you think?
          Hide
          vanzin Marcelo Vanzin added a comment -

          Hi Zhan,

          As I mentioned, I'm waiting for issues being discussed in YARN-1530 to be resolved first. The current plans, as far as I am aware, would result in incompatible API changes in the timeline server API, so I'd rather wait for that before pushing any solution in Spark.

          You're free to come up with your own solution if you want, but I would seriously recommend waiting for the timeline server to actually reach production-level quality before going with integration, especially as far as its API goes.

          Show
          vanzin Marcelo Vanzin added a comment - Hi Zhan, As I mentioned, I'm waiting for issues being discussed in YARN-1530 to be resolved first. The current plans, as far as I am aware, would result in incompatible API changes in the timeline server API, so I'd rather wait for that before pushing any solution in Spark. You're free to come up with your own solution if you want, but I would seriously recommend waiting for the timeline server to actually reach production-level quality before going with integration, especially as far as its API goes.
          Hide
          zjshen Zhijie Shen added a comment -

          Marcelo Vanzin, thanks for introducing YARN timeline server to Spark. Let me briefly summarize the current status of the timeline server and answer some concerns here. Spark folks who are interested in this monitoring service offered by YARN can go ahead to YARN-1530 to read the design doc and watch the latest progress.

          1. The essential functions or the timeline service have been available since Hadoop 2.4. Basically, the user can organize the app's history or metrics according to timeline data model and post it the the timeline server. Later on, user or admin can come back to query this information to analyze how the app was going. The essential APIs keep unchanged from 2.4 to the coming 2.6. There should NOT be any incompatible API changes that will block this work. Moreover, Keeping compatible is always in our consideration when coming up with new features in the following Hadoop releases.

          2. It's NOT exactly that the timeline server is not production-ready. In fact, Apache Tez has already integrated the timeline server for logging the history information. In the coming Hadoop 2.6, MapReduce is also enabled to publish the history information to the timeline server, too. Moreover, within the scope of YARN, a built-in generic history service on top of the timeline service is available to YARN users to watch all kinds of apps. Hence, with several successful pioneer, Spark should be confident enough to take the new merit of YARN.

          3. While YARN community is progressing quickly to improve the timeline server in terms of security (coming 2.6), high availability, scalability, better client libs and so on, it should not disturb the initial attempt for Spark to embrace the timeline server, but will offer better experience if Spark is riding on it.

          If you have other issue of high priority to work on, I think Zhan Zhang will be able to help this integration. Thanks!

          Show
          zjshen Zhijie Shen added a comment - Marcelo Vanzin , thanks for introducing YARN timeline server to Spark. Let me briefly summarize the current status of the timeline server and answer some concerns here. Spark folks who are interested in this monitoring service offered by YARN can go ahead to YARN-1530 to read the design doc and watch the latest progress. 1. The essential functions or the timeline service have been available since Hadoop 2.4. Basically, the user can organize the app's history or metrics according to timeline data model and post it the the timeline server. Later on, user or admin can come back to query this information to analyze how the app was going. The essential APIs keep unchanged from 2.4 to the coming 2.6. There should NOT be any incompatible API changes that will block this work. Moreover, Keeping compatible is always in our consideration when coming up with new features in the following Hadoop releases. 2. It's NOT exactly that the timeline server is not production-ready. In fact, Apache Tez has already integrated the timeline server for logging the history information. In the coming Hadoop 2.6, MapReduce is also enabled to publish the history information to the timeline server, too. Moreover, within the scope of YARN, a built-in generic history service on top of the timeline service is available to YARN users to watch all kinds of apps. Hence, with several successful pioneer, Spark should be confident enough to take the new merit of YARN. 3. While YARN community is progressing quickly to improve the timeline server in terms of security (coming 2.6), high availability, scalability, better client libs and so on, it should not disturb the initial attempt for Spark to embrace the timeline server, but will offer better experience if Spark is riding on it. If you have other issue of high priority to work on, I think Zhan Zhang will be able to help this integration. Thanks!
          Hide
          vanzin Marcelo Vanzin added a comment -

          ...security (coming 2.6), high availability, scalability, better client libs and so on...

          That's exactly my point about the ATS not being production-level quality yet. The current plans I'm aware of would require changes in the ATS API. Since Spark does not support the ATS at the moment, I'd rather have it support the new-and-secure-and-scalable-and-available API than the current one. Otherwise you'll get into the mess of having to conditionally compile code for both APIs, or implement part of those features into your own client code (something I've done in my proof-of-concept but I'd really like to avoid, because it's really just trying to work around limitations in the current ATS design).

          So, short version of what I'm trying to say: yes, you can build something that talks to the current ATS. But given that it currently has shortcomings, and the fix for those will, as far as I know, affect the client API, I don't see the point in trying to push that integration at this moment when Spark already has a working solution for job history, just so that you'll ship code that will be immediately deprecated by the new ATS...

          Show
          vanzin Marcelo Vanzin added a comment - ...security (coming 2.6), high availability, scalability, better client libs and so on... That's exactly my point about the ATS not being production-level quality yet. The current plans I'm aware of would require changes in the ATS API. Since Spark does not support the ATS at the moment, I'd rather have it support the new-and-secure-and-scalable-and-available API than the current one. Otherwise you'll get into the mess of having to conditionally compile code for both APIs, or implement part of those features into your own client code (something I've done in my proof-of-concept but I'd really like to avoid, because it's really just trying to work around limitations in the current ATS design). So, short version of what I'm trying to say: yes, you can build something that talks to the current ATS. But given that it currently has shortcomings, and the fix for those will, as far as I know, affect the client API, I don't see the point in trying to push that integration at this moment when Spark already has a working solution for job history, just so that you'll ship code that will be immediately deprecated by the new ATS...
          Hide
          zjshen Zhijie Shen added a comment -

          That's exactly my point about the ATS not being production-level quality yet. The current plans I'm aware of would require changes in the ATS API.

          Not to mention the definition of production ready (which differs from community to community, such as Tez and MapReduce), I'm curious about the required API changes of the timeline server. Please elaborate the changes in case I've missed some discussion. On the other side, according to my understanding of the timeline server, the ongoing and the future improvement is:

          1) Security is coming with Hadoop 2.6, which doesn't affect the usage of the existing APIs in a insecure mode. AFAIK, Spark is working with Hadoop 2.3(4). It should be okay to ride on the timeline server in insecure mode. Whenever upgrading to Hadoop 2.6, you just need to turn on the security switch.

          2) Timeline availability and scalability is going to be a server side improvement, but doesn't affect user-faced API. In the scope of YARN, we have already successfully enhance RM with the HA feature while making it transparent to the user. I'm not aware of the major blocker that prevents the timeline server to achieve the same goal.

          3) For the client libs, we're trying to help to users to utilize the timeline service more easily (e.g., YARN-2517, YARN-2673), which are either transparent or additions. As I've mentioned before, we're careful about any proposed changes that will break the incompatibility.

          I'm commenting on this Jira to share more insights about the timeline server to Spark folks in case the folks interested in this YARN offer. It's up to Spark folks to decide whether they want to make use of it or when they make use of it.

          Show
          zjshen Zhijie Shen added a comment - That's exactly my point about the ATS not being production-level quality yet. The current plans I'm aware of would require changes in the ATS API. Not to mention the definition of production ready (which differs from community to community, such as Tez and MapReduce), I'm curious about the required API changes of the timeline server. Please elaborate the changes in case I've missed some discussion. On the other side, according to my understanding of the timeline server, the ongoing and the future improvement is: 1) Security is coming with Hadoop 2.6, which doesn't affect the usage of the existing APIs in a insecure mode. AFAIK, Spark is working with Hadoop 2.3(4). It should be okay to ride on the timeline server in insecure mode. Whenever upgrading to Hadoop 2.6, you just need to turn on the security switch. 2) Timeline availability and scalability is going to be a server side improvement, but doesn't affect user-faced API. In the scope of YARN, we have already successfully enhance RM with the HA feature while making it transparent to the user. I'm not aware of the major blocker that prevents the timeline server to achieve the same goal. 3) For the client libs, we're trying to help to users to utilize the timeline service more easily (e.g., YARN-2517 , YARN-2673 ), which are either transparent or additions. As I've mentioned before, we're careful about any proposed changes that will break the incompatibility. I'm commenting on this Jira to share more insights about the timeline server to Spark folks in case the folks interested in this YARN offer. It's up to Spark folks to decide whether they want to make use of it or when they make use of it.
          Hide
          vanzin Marcelo Vanzin added a comment -

          Please elaborate the changes in case I've missed some discussion

          That's part of why I'm waiting on SPARK-1530. There's been no activity in a while; I've been told there have been offline discussions but I don't see any updates on the actual issue itself, so that's the main reason why I've been holding off on this work: I don't feel it's a good investment of time to go forward with something that might change in the near future.

          It would be great if you could update that bug with a concrete plan for the post-2.6 updates related to reliability and other features. If they really don't affect the client API, then great, I can continue my Spark-side work without worries. But again, I've mainly been waiting because of the radio silence from the ATS side w.r.t. the issues that I think are important to Spark.

          Show
          vanzin Marcelo Vanzin added a comment - Please elaborate the changes in case I've missed some discussion That's part of why I'm waiting on SPARK-1530 . There's been no activity in a while; I've been told there have been offline discussions but I don't see any updates on the actual issue itself, so that's the main reason why I've been holding off on this work: I don't feel it's a good investment of time to go forward with something that might change in the near future. It would be great if you could update that bug with a concrete plan for the post-2.6 updates related to reliability and other features. If they really don't affect the client API, then great, I can continue my Spark-side work without worries. But again, I've mainly been waiting because of the radio silence from the ATS side w.r.t. the issues that I think are important to Spark.
          Hide
          vanzin Marcelo Vanzin added a comment -

          BTW, if you want a list of things I think are important for Spark, here are some quick ones:

          • YARN-2521 (I've sort of implemented this in my code, but would really like to not have to care about it)
          • YARN-2423 (note how this is a new API)
          • YARN-2444

          YARN-2521 might be the same as YARN-2673, no? YARN-2513 is sort of interesting but not necessary.

          Show
          vanzin Marcelo Vanzin added a comment - BTW, if you want a list of things I think are important for Spark, here are some quick ones: YARN-2521 (I've sort of implemented this in my code, but would really like to not have to care about it) YARN-2423 (note how this is a new API) YARN-2444 YARN-2521 might be the same as YARN-2673 , no? YARN-2513 is sort of interesting but not necessary.
          Hide
          zzhan Zhan Zhang added a comment -

          Yarn-2521 can make client easier to use, but not critical. Some application logic make the client cache difficult to be generic.
          Yarn-2444 may be already obsolete.

          Show
          zzhan Zhan Zhang added a comment - Yarn-2521 can make client easier to use, but not critical. Some application logic make the client cache difficult to be generic. Yarn-2444 may be already obsolete.
          Hide
          vanzin Marcelo Vanzin added a comment -

          I think it's pretty critical when you can't upload your data because the server is down; it means we can't really recommend using the current ATS because it's not reliable. I understand it doesn't affect the client API and we can still have the code in, but it's an important feature that seems to be missing.

          YARN-2423, though, is really something that can't be done today without poking into private Yarn classes or writing a bunch of extra code. I really wouldn't want to have to support any of those two options in Spark.

          Show
          vanzin Marcelo Vanzin added a comment - I think it's pretty critical when you can't upload your data because the server is down; it means we can't really recommend using the current ATS because it's not reliable. I understand it doesn't affect the client API and we can still have the code in, but it's an important feature that seems to be missing. YARN-2423 , though, is really something that can't be done today without poking into private Yarn classes or writing a bunch of extra code. I really wouldn't want to have to support any of those two options in Spark.
          Hide
          zjshen Zhijie Shen added a comment -

          BTW, if you want a list of things I think are important for Spark, here are some quick ones:

          Thanks for sharing the details, which are more helpful to clean up the puzzles than some big but vague statement. Let me go through the aforementioned Jiras:

          • YARN-2521: I'd like to keep it open for some further client improvement, such as local timeline data caching, while YARN-2673 already made the client retry when the server temporally doesn't respond. Please note that "I think it's pretty critical when you can't upload your data because the server is down" is no longer true after YARN-2673. On the other side, At the point of view of the API, it should keep stable.
          • YARN-2423: This is proposed to improve the Java libs by adding GET APIs. They are used to query data, NOT to put data. We do this to help the use case that the developers write Java code to implement the UI to analyze the timeline data. Framework integration mainly deals with PUT APIs, and the Java client libs are already there. Take one step back, apart from the client libs, the RESTful APIs are always there, which is programming language neutral, and useful to non-Java developers.
          • YARN-2444: It's may be a bug or an improper use case. According to the exception, the user doesn't pass the authorization for some reason. It is reported for 2.5, and is probably no longer valid after we fixed a bunch of security issues for 2.6. We need to do more validation for this issue before a conclusion. Anyway, it's obviously an internal issue happening in secure mode only, which should not the API CHANGES.

          I understand it doesn't affect the client API and we can still have the code in,

          It seems that we have the agreement that the current timeline service offering is not blocking the Spark integration work.

          Show
          zjshen Zhijie Shen added a comment - BTW, if you want a list of things I think are important for Spark, here are some quick ones: Thanks for sharing the details, which are more helpful to clean up the puzzles than some big but vague statement. Let me go through the aforementioned Jiras: YARN-2521 : I'd like to keep it open for some further client improvement, such as local timeline data caching, while YARN-2673 already made the client retry when the server temporally doesn't respond. Please note that "I think it's pretty critical when you can't upload your data because the server is down" is no longer true after YARN-2673 . On the other side, At the point of view of the API, it should keep stable. YARN-2423 : This is proposed to improve the Java libs by adding GET APIs. They are used to query data, NOT to put data. We do this to help the use case that the developers write Java code to implement the UI to analyze the timeline data. Framework integration mainly deals with PUT APIs, and the Java client libs are already there. Take one step back, apart from the client libs, the RESTful APIs are always there, which is programming language neutral, and useful to non-Java developers. YARN-2444 : It's may be a bug or an improper use case. According to the exception, the user doesn't pass the authorization for some reason. It is reported for 2.5, and is probably no longer valid after we fixed a bunch of security issues for 2.6. We need to do more validation for this issue before a conclusion. Anyway, it's obviously an internal issue happening in secure mode only, which should not the API CHANGES. I understand it doesn't affect the client API and we can still have the code in, It seems that we have the agreement that the current timeline service offering is not blocking the Spark integration work.
          Hide
          vanzin Marcelo Vanzin added a comment -

          This is proposed to improve the Java libs by adding GET APIs. They are used to query data, NOT to put data.

          Spark needs both to put and read data, otherwise the ATS is useless for Spark. The current goal of Spark is to use the ATS as a store for its history data, since the data itself is not considered public and stable itself.

          So there is no point in integration if you can only write data. (I know you can read data through other means, but I don't want to write a custom REST client just to get ATS support in.)

          It is reported for 2.5, and is probably no longer valid after we fixed a bunch of security issues for 2.6.

          I'm not sure why you say it's security-related since there nothing security-related in the example code I posted. And if something doesn't work in 2.5 but works in 2.6, it means we (and by that I mean Spark) have to restrict our support to the versions where things work - even if the underlying API is exactly the same.

          Show
          vanzin Marcelo Vanzin added a comment - This is proposed to improve the Java libs by adding GET APIs. They are used to query data, NOT to put data. Spark needs both to put and read data, otherwise the ATS is useless for Spark. The current goal of Spark is to use the ATS as a store for its history data, since the data itself is not considered public and stable itself. So there is no point in integration if you can only write data. (I know you can read data through other means, but I don't want to write a custom REST client just to get ATS support in.) It is reported for 2.5, and is probably no longer valid after we fixed a bunch of security issues for 2.6. I'm not sure why you say it's security-related since there nothing security-related in the example code I posted. And if something doesn't work in 2.5 but works in 2.6, it means we (and by that I mean Spark) have to restrict our support to the versions where things work - even if the underlying API is exactly the same.
          Hide
          zjshen Zhijie Shen added a comment -

          Spark needs both to put and read data

          It's again a vague statement. Can you share your design detail, such that we can evaluate it is really necessary?
          And what is the actual way of visualizing data? And integration work is not just single bug fix patch, we can divide work into a sequent of sub tasks, and the first step is to enable Spark job to be able to putting the data into the timeline server. By doing this, not only Spark's only web front can visualize job history, it also enable the third-party tools to do Spark job analysis too.

          I'm not sure why you say it's security-related since there nothing security-related in the example code I posted.

          I said "According to the exception, the user doesn't pass the authorization for some reason." If you don't agree on it, please post your investigation on YARN-2444, YARN folks will help you on this issue.

          if something doesn't work in 2.5 but works in 2.6,

          No matter the integration with timeline service, Spark on YARN is picking Hadoop versions now. It doesn't make sense to ask for a feature by using an early version that hasn't it.

          Show
          zjshen Zhijie Shen added a comment - Spark needs both to put and read data It's again a vague statement. Can you share your design detail, such that we can evaluate it is really necessary? And what is the actual way of visualizing data? And integration work is not just single bug fix patch, we can divide work into a sequent of sub tasks, and the first step is to enable Spark job to be able to putting the data into the timeline server. By doing this, not only Spark's only web front can visualize job history, it also enable the third-party tools to do Spark job analysis too. I'm not sure why you say it's security-related since there nothing security-related in the example code I posted. I said "According to the exception, the user doesn't pass the authorization for some reason." If you don't agree on it, please post your investigation on YARN-2444 , YARN folks will help you on this issue. if something doesn't work in 2.5 but works in 2.6, No matter the integration with timeline service, Spark on YARN is picking Hadoop versions now. It doesn't make sense to ask for a feature by using an early version that hasn't it.
          Hide
          vanzin Marcelo Vanzin added a comment -

          It's again a vague statement.

          I don't know what is vague about wanting to read the data you write.

          Can you share your design detail

          I already did way better than that, way earlier in this bug: I shared the actual code. For this particular question, here it is:
          https://github.com/vanzin/spark/blob/yarn-timeline/yarn/timeline/src/main/scala/org/apache/spark/deploy/yarn/timeline/YarnTimelineProvider.scala

          See how it reads data from the ATS? It feeds it into the Spark history server, where the data can be visualized. It's using Yarn internal APIs, which is generally bad practice.

          If you don't agree on it, please post your investigation on YARN-2444, YARN folks will help you on this issue.

          I posted the error and the code to reproduce it. I don't know what else do you expect from me. If you think it's an authorization issue, test it with 2.6 and close the bug if you believe it's fixed.

          No matter the integration with timeline service, Spark on YARN is picking Hadoop versions now. It doesn't make sense to ask for a feature by using an early version that hasn't it.

          I'm not sure I really understood what you're trying to say here. Yes, we have to pick versions. We need a version that supports the features we need. Even if the API in 2.5 didn't change in 2.6, it seems to have bugs that prevent my current code from working, so there is no point in trying to integrate with 2.5 as far as I'm concerned. And as far as I know, 2.6 hasn't been released yet. (BTW, my code used to work with 2.4.)

          Show
          vanzin Marcelo Vanzin added a comment - It's again a vague statement. I don't know what is vague about wanting to read the data you write. Can you share your design detail I already did way better than that, way earlier in this bug: I shared the actual code. For this particular question, here it is: https://github.com/vanzin/spark/blob/yarn-timeline/yarn/timeline/src/main/scala/org/apache/spark/deploy/yarn/timeline/YarnTimelineProvider.scala See how it reads data from the ATS? It feeds it into the Spark history server, where the data can be visualized. It's using Yarn internal APIs, which is generally bad practice. If you don't agree on it, please post your investigation on YARN-2444 , YARN folks will help you on this issue. I posted the error and the code to reproduce it. I don't know what else do you expect from me. If you think it's an authorization issue, test it with 2.6 and close the bug if you believe it's fixed. No matter the integration with timeline service, Spark on YARN is picking Hadoop versions now. It doesn't make sense to ask for a feature by using an early version that hasn't it. I'm not sure I really understood what you're trying to say here. Yes, we have to pick versions. We need a version that supports the features we need. Even if the API in 2.5 didn't change in 2.6, it seems to have bugs that prevent my current code from working, so there is no point in trying to integrate with 2.5 as far as I'm concerned. And as far as I know, 2.6 hasn't been released yet. (BTW, my code used to work with 2.4.)
          Hide
          vanzin Marcelo Vanzin added a comment -

          I believe with YARN-2033 and YARN-2423 I can work around YARN-2444 even if it's still an issue, so I'll add the dependency accordingly.

          Show
          vanzin Marcelo Vanzin added a comment - I believe with YARN-2033 and YARN-2423 I can work around YARN-2444 even if it's still an issue, so I'll add the dependency accordingly.
          Hide
          zzhan Zhan Zhang added a comment -

          I have sent a PR with WIP for people who are interested.
          https://github.com/apache/spark/pull/4683/files

          Show
          zzhan Zhan Zhang added a comment - I have sent a PR with WIP for people who are interested. https://github.com/apache/spark/pull/4683/files
          Hide
          apachespark Apache Spark added a comment -

          User 'zhzhan' has created a pull request for this issue:
          https://github.com/apache/spark/pull/4683

          Show
          apachespark Apache Spark added a comment - User 'zhzhan' has created a pull request for this issue: https://github.com/apache/spark/pull/4683
          Hide
          zzhan Zhan Zhang added a comment -

          Patch against v1.2.1

          Show
          zzhan Zhan Zhang added a comment - Patch against v1.2.1
          Hide
          zzhan Zhan Zhang added a comment -

          High level design doc for spark ATS integration.

          Show
          zzhan Zhan Zhang added a comment - High level design doc for spark ATS integration.
          Hide
          vanzin Marcelo Vanzin added a comment -

          Hi Zhan Zhang, thanks for uploading the document.

          Reading through it, I don't see anything that is really that much different from my initial proof-of-concept. The points I'd like to highlight are:

          • It still depends on YARN-2423, or at least on some effort to write a REST client that does not depend on internal Yarn classes.
          • What about overhead of the read code? Large jobs with lots of tasks, or really long jobs such as Spark Streaming jobs, will have a really large amount of events. Fetching them all in one batch would require a lot of memory for serializing the data on both sides (ATS and History Server).
          • Any security considerations? I haven't really kept up-to-date with the security changes in the ATS after I ran into issues with my p.o.c.; but mainly, does the Spark job need any special tokens to talk to the ATS when security is enabled? Does the ATS guarantee that only the job itself (or someone with the right credentials) can add events to its timeline? Or is that all handled transparently, somehow, by the client library?
          • Does YARN-2928 affect the design in any way? I took a quick look at the data model, so hopefully they'll keep things backwards compatible. But it would kinda suck to add support for an API with a limited shelf life if that's not the case.
          Show
          vanzin Marcelo Vanzin added a comment - Hi Zhan Zhang , thanks for uploading the document. Reading through it, I don't see anything that is really that much different from my initial proof-of-concept. The points I'd like to highlight are: It still depends on YARN-2423 , or at least on some effort to write a REST client that does not depend on internal Yarn classes. What about overhead of the read code? Large jobs with lots of tasks, or really long jobs such as Spark Streaming jobs, will have a really large amount of events. Fetching them all in one batch would require a lot of memory for serializing the data on both sides (ATS and History Server). Any security considerations? I haven't really kept up-to-date with the security changes in the ATS after I ran into issues with my p.o.c.; but mainly, does the Spark job need any special tokens to talk to the ATS when security is enabled? Does the ATS guarantee that only the job itself (or someone with the right credentials) can add events to its timeline? Or is that all handled transparently, somehow, by the client library? Does YARN-2928 affect the design in any way? I took a quick look at the data model, so hopefully they'll keep things backwards compatible. But it would kinda suck to add support for an API with a limited shelf life if that's not the case.
          Hide
          zzhan Zhan Zhang added a comment - - edited

          Marcelo Vanzin Thanks for the comments. I don't understand you keep saying "my code does not have many differences form your code." We are working for apache project, and we all follow apache policy. Here is the link for apache license details:
          http://www.apache.org/licenses/LICENSE-2.0.

          Since you think your prototype is ready half year ago, as I request several times, why not post your workable patch and design and move forward. I will explain to you clearly "what's the major difference of the core design of my code from yours" . The patch size is small, and the design is not so complicated, but I am sure to show you where those core design come from.

          After you post your design and code, we can start from there.

          Thanks.

          Zhan Zhang

          Show
          zzhan Zhan Zhang added a comment - - edited Marcelo Vanzin Thanks for the comments. I don't understand you keep saying "my code does not have many differences form your code." We are working for apache project, and we all follow apache policy. Here is the link for apache license details: http://www.apache.org/licenses/LICENSE-2.0 . Since you think your prototype is ready half year ago, as I request several times, why not post your workable patch and design and move forward. I will explain to you clearly "what's the major difference of the core design of my code from yours" . The patch size is small, and the design is not so complicated, but I am sure to show you where those core design come from. After you post your design and code, we can start from there. Thanks. Zhan Zhang
          Hide
          vanzin Marcelo Vanzin added a comment -

          Hi Zhan Zhang,

          I already posted the link to my code in this bug several times. The reason why I haven't sent a PR is the exact reason I raised about your spec and your patch: it uses private Yarn APIs. I've said this several times, and I really don't understand what part of it you don't understand. Pardon me if I haven't been clear about it.

          Also note how there's Yarn bug in the list of blocker bugs for this one. That's because my p.o.c. code depends on that bug to be fixed before it can move forward. If you have a design that is not blocked by that code, and does not use internal APIs, feel free to remove the link and post it.

          Here's the link to the comment with the link to my code, dated August '14:
          https://issues.apache.org/jira/browse/SPARK-1537?focusedCommentId=14088438&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14088438

          A link you have already seen, since you used parts of that code in your patch.

          So please, can you reply to my actual comments instead of keep going back to this issue? My comments have nothing to do with the fact that I've written a p.o.c. for this feature. They're issues that exist in your spec and your code independent of anything I've done.

          Show
          vanzin Marcelo Vanzin added a comment - Hi Zhan Zhang , I already posted the link to my code in this bug several times. The reason why I haven't sent a PR is the exact reason I raised about your spec and your patch: it uses private Yarn APIs. I've said this several times, and I really don't understand what part of it you don't understand. Pardon me if I haven't been clear about it. Also note how there's Yarn bug in the list of blocker bugs for this one. That's because my p.o.c. code depends on that bug to be fixed before it can move forward. If you have a design that is not blocked by that code, and does not use internal APIs, feel free to remove the link and post it. Here's the link to the comment with the link to my code, dated August '14: https://issues.apache.org/jira/browse/SPARK-1537?focusedCommentId=14088438&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14088438 A link you have already seen, since you used parts of that code in your patch. So please, can you reply to my actual comments instead of keep going back to this issue? My comments have nothing to do with the fact that I've written a p.o.c. for this feature. They're issues that exist in your spec and your code independent of anything I've done.
          Hide
          zzhan Zhan Zhang added a comment -

          Marcelo Vanzin If you don't have bandwidth, or don't know how to move forward with this JIRA after a long time. I don't mind to take it over.

          Show
          zzhan Zhan Zhang added a comment - Marcelo Vanzin If you don't have bandwidth, or don't know how to move forward with this JIRA after a long time. I don't mind to take it over.
          Hide
          zzhan Zhan Zhang added a comment -

          Marcelo Vanzin I declare "integrate your code" from the first submission of PR. Do you want to count how many times you keeping saying this?

          "Here's the link to the comment with the link to my code, dated August '14". Now spark is under the vote for 1.3, and today is 2/20/2015. Is it so difficult submit a workable patch and design doc?

          Show
          zzhan Zhan Zhang added a comment - Marcelo Vanzin I declare "integrate your code" from the first submission of PR. Do you want to count how many times you keeping saying this? "Here's the link to the comment with the link to my code, dated August '14". Now spark is under the vote for 1.3, and today is 2/20/2015. Is it so difficult submit a workable patch and design doc?
          Hide
          vanzin Marcelo Vanzin added a comment -

          It's impossible to submit a patch when the implementation is currently blocked on a feature that doesn't exist in Yarn. Please check the "is blocked by" link at the top of this bug.

          If you're willing to write the code to work around that missing feature, please include that in your spec and patch. I am not and would rather wait for Yarn instead.

          Show
          vanzin Marcelo Vanzin added a comment - It's impossible to submit a patch when the implementation is currently blocked on a feature that doesn't exist in Yarn. Please check the "is blocked by" link at the top of this bug. If you're willing to write the code to work around that missing feature, please include that in your spec and patch. I am not and would rather wait for Yarn instead.
          Hide
          srowen Sean Owen added a comment -

          Zhan Zhang I also can't figure out what you are suggesting here. You have proposed a patch, and you've been given feedback with specific reasons it shouldn't be committed to Spark. I agree with those, FWIW, thought I think they can be overcome soon. I assume others agree, given the silence . You haven't responded to these specific points. As it stands I think that's your answer: these YARN issues need to be addressed – either fixed or agreed to be not an issue.

          Nobody needs to 'take over'. I'm not clear why you think you have been waiting on something or someone to give you code. Right now the only thing this is waiting on is for you or Zhijie Shen or anyone to address the YARN API issues. Rather than keep the broken record going, why not address the YARN API issues highlighted here? sorry, the answer may be that you can't commit this patch you want to by yourself but that's just how OSS works.

          Show
          srowen Sean Owen added a comment - Zhan Zhang I also can't figure out what you are suggesting here. You have proposed a patch, and you've been given feedback with specific reasons it shouldn't be committed to Spark. I agree with those, FWIW, thought I think they can be overcome soon. I assume others agree, given the silence . You haven't responded to these specific points. As it stands I think that's your answer: these YARN issues need to be addressed – either fixed or agreed to be not an issue. Nobody needs to 'take over'. I'm not clear why you think you have been waiting on something or someone to give you code. Right now the only thing this is waiting on is for you or Zhijie Shen or anyone to address the YARN API issues. Rather than keep the broken record going, why not address the YARN API issues highlighted here? sorry, the answer may be that you can't commit this patch you want to by yourself but that's just how OSS works.
          Hide
          zzhan Zhan Zhang added a comment -

          Sean Owen From the whole context, I believe you understand what happened here. Let's be professional.

          My request is "if someone want to try this alpha feature, we can provide a patch at least so that people can give it a try. Even if it cannot go upstream due to various reasons."

          Due to Yarn block, we should discuss with the yarn community, instead of filing a bug and wait forever.

          Show
          zzhan Zhan Zhang added a comment - Sean Owen From the whole context, I believe you understand what happened here. Let's be professional. My request is "if someone want to try this alpha feature, we can provide a patch at least so that people can give it a try. Even if it cannot go upstream due to various reasons." Due to Yarn block, we should discuss with the yarn community, instead of filing a bug and wait forever.
          Hide
          srowen Sean Owen added a comment -

          Zhan Zhang You have provided a patch as a PR right? anyone can try it. Request granted.

          Given the YARN JIRAs already referenced here, some of which have patches ready to go too, I think it has been discussed in YARN too? What isn't happening with YARN that should be, and, can you help with it? I'm not sure if that's where you are saying the waiting is. That is: hasn't this been blocked on YARN changes for a long time?

          I get it, one person's 'outstanding bug' is another's 'will not fix' but that's the give and take of OSS. If you want this feature in Spark, and people are asking that it should depend on some YARN changes – then what do you think about lobbying for those YARN changes? or do you disagree that they're necessary, and can you argue that here please?

          I don't understand your second reply. Yes, it sounds like two people have a similar solution with a similar problem with YARN APIs. You say you're not waiting on code now, but have repeatedly asked Marcelo to share some (other?) code. It's odd since, yes, it's very clear you acknowledge you've already seen his code and reused a bit, which is entirely fine. I hope we're done with that exchange.

          I sense some insinuation that code is being 'hidden' in bad faith, but I can't figure out the conspiracy. I see every willingness to make your change alone here, if you propose something that addresses the YARN issues raised here. You are not blocked on anyone else's patch. However all of us are 'blocked' on the consensus of community / committers that care about this issue, and it looks like the response is clear so far: not until YARN API stuff is sorted out one way or the other.

          Are you suggesting this patch should be committed without the YARN changes? or that you're working on the YARN changes? what do you want to take over and do next?

          Show
          srowen Sean Owen added a comment - Zhan Zhang You have provided a patch as a PR right? anyone can try it. Request granted. Given the YARN JIRAs already referenced here, some of which have patches ready to go too, I think it has been discussed in YARN too? What isn't happening with YARN that should be, and, can you help with it? I'm not sure if that's where you are saying the waiting is. That is: hasn't this been blocked on YARN changes for a long time? I get it, one person's 'outstanding bug' is another's 'will not fix' but that's the give and take of OSS. If you want this feature in Spark, and people are asking that it should depend on some YARN changes – then what do you think about lobbying for those YARN changes? or do you disagree that they're necessary, and can you argue that here please? I don't understand your second reply. Yes, it sounds like two people have a similar solution with a similar problem with YARN APIs. You say you're not waiting on code now, but have repeatedly asked Marcelo to share some (other?) code. It's odd since, yes, it's very clear you acknowledge you've already seen his code and reused a bit, which is entirely fine. I hope we're done with that exchange. I sense some insinuation that code is being 'hidden' in bad faith, but I can't figure out the conspiracy. I see every willingness to make your change alone here, if you propose something that addresses the YARN issues raised here. You are not blocked on anyone else's patch. However all of us are 'blocked' on the consensus of community / committers that care about this issue, and it looks like the response is clear so far: not until YARN API stuff is sorted out one way or the other. Are you suggesting this patch should be committed without the YARN changes? or that you're working on the YARN changes? what do you want to take over and do next?
          Hide
          zzhan Zhan Zhang added a comment -

          Sean Owen In JIRA, we share the code so that other people can comment and review. I am not waiting for patch. But It is hard to comment or review patch given a hyper-link.

          I never think to make my change alone. Actually from the beginning I acknowledge his contribution, and don't mind closing my PR and help to review his at all if you follow the PR record. Do you agree?

          You mention you sense some insinuation and conspiracy. I didn't sense it. Can you please educate me if you figure it out?

          Let's go back to technical: Overall, it is early adoption for timeline service. It is alpha feature, but most functionality is working although with some walkaround.

          REST client: Currently Timeline client does not provide retrieve API. So we walk around with the similar approach to the timeclient its own implementation. This needs to be changed after timeline component provide more mature API.

          Read overhead and scalability: The effort is in the roadmap in yarn timeline service. This is a critical feature to use timeline service. Current HDFS approach in spark may not scalable due to similar reason (point me out if I am wrong), and timeline service may be more promising, although it is not there yet.

          Security: The security is handled transparently in timeline client.

          ACL: Timeline has ACL control as in hadoop-2.6, and client can create and set domain with R/W so that control the permission.

          Show
          zzhan Zhan Zhang added a comment - Sean Owen In JIRA, we share the code so that other people can comment and review. I am not waiting for patch. But It is hard to comment or review patch given a hyper-link. I never think to make my change alone. Actually from the beginning I acknowledge his contribution, and don't mind closing my PR and help to review his at all if you follow the PR record. Do you agree? You mention you sense some insinuation and conspiracy. I didn't sense it. Can you please educate me if you figure it out? Let's go back to technical: Overall, it is early adoption for timeline service. It is alpha feature, but most functionality is working although with some walkaround. REST client: Currently Timeline client does not provide retrieve API. So we walk around with the similar approach to the timeclient its own implementation. This needs to be changed after timeline component provide more mature API. Read overhead and scalability: The effort is in the roadmap in yarn timeline service. This is a critical feature to use timeline service. Current HDFS approach in spark may not scalable due to similar reason (point me out if I am wrong), and timeline service may be more promising, although it is not there yet. Security: The security is handled transparently in timeline client. ACL: Timeline has ACL control as in hadoop-2.6, and client can create and set domain with R/W so that control the permission.
          Hide
          vanzin Marcelo Vanzin added a comment -

          Hi Zhan Zhang,

          But It is hard to comment or review patch given a hyper-link.

          Perhaps you're not familiar with all of Github's features, but you can click on each individual commit and comment on the code right there, just like you can on a PR created from those commits. Even if that doesn't sound very appealing, it's not hard to copy & paste the code and comment here if you really want to. Or generate a downloadable diff from the commits (just add ".diff" at the end of the commit URL, e.g. https://github.com/vanzin/spark/commit/c1365e0de264daa015c61a2248c80dfdea705786.diff).

          REST client: Currently Timeline client does not provide retrieve API.

          That's the main reason why this feature hasn't moved forward. Using internal APIs to achieve that is something we're not willing to do in Spark, because it exposes us to future breakages and makes compatibility harder to maintain (just look at what has been done for Hive). So we either need the new API in Yarn, or we need to invest time to create a client API that does not use Yarn's classes.

          ACL: Timeline has ACL control as in hadoop-2.6

          I'll believe you here since I haven't looked at that code yet. But it seems like it requires work on the client side, which is not currently covered in your spec.

          Read overhead and scalability: The effort is in the roadmap in yarn timeline service. This is a critical feature to use timeline service. Current HDFS approach in spark may not scalable due to similar reason

          I think we're talking about different things. What I'm referring to is that the current code that reads from the ATS reads all events of a particular entity at the same time. If that entity has a large number of events, that will require a lot of memory on the ATS side to serialize the data, and a lot of memory on the Spark History Server side to deserialize it. It's orthogonal to whether the backing store is scalable or not.

          Show
          vanzin Marcelo Vanzin added a comment - Hi Zhan Zhang , But It is hard to comment or review patch given a hyper-link. Perhaps you're not familiar with all of Github's features, but you can click on each individual commit and comment on the code right there, just like you can on a PR created from those commits. Even if that doesn't sound very appealing, it's not hard to copy & paste the code and comment here if you really want to. Or generate a downloadable diff from the commits (just add ".diff" at the end of the commit URL, e.g. https://github.com/vanzin/spark/commit/c1365e0de264daa015c61a2248c80dfdea705786.diff ). REST client: Currently Timeline client does not provide retrieve API. That's the main reason why this feature hasn't moved forward. Using internal APIs to achieve that is something we're not willing to do in Spark, because it exposes us to future breakages and makes compatibility harder to maintain (just look at what has been done for Hive). So we either need the new API in Yarn, or we need to invest time to create a client API that does not use Yarn's classes. ACL: Timeline has ACL control as in hadoop-2.6 I'll believe you here since I haven't looked at that code yet. But it seems like it requires work on the client side, which is not currently covered in your spec. Read overhead and scalability: The effort is in the roadmap in yarn timeline service. This is a critical feature to use timeline service. Current HDFS approach in spark may not scalable due to similar reason I think we're talking about different things. What I'm referring to is that the current code that reads from the ATS reads all events of a particular entity at the same time. If that entity has a large number of events, that will require a lot of memory on the ATS side to serialize the data, and a lot of memory on the Spark History Server side to deserialize it. It's orthogonal to whether the backing store is scalable or not.
          Hide
          zzhan Zhan Zhang added a comment -

          Marcelo Vanzin We should centralized all comments and reviews in one place, instead of going to different links. Also, we want to the reviewed code is updated, instead of based on some old version.

          Let's go to technical:

          1. We all agree on this one about timeline client, and this is why it is alpha feature. Hive is a good example, but nobody can deny its importance in spark.
          2. ACL is included in the patch, but not in the spec.
          3. I understand your question, but the scope of my respond may be too big. To solve this, more work is needed on the entity design.

          Let's keep an eye on these issues.

          Show
          zzhan Zhan Zhang added a comment - Marcelo Vanzin We should centralized all comments and reviews in one place, instead of going to different links. Also, we want to the reviewed code is updated, instead of based on some old version. Let's go to technical: 1. We all agree on this one about timeline client, and this is why it is alpha feature. Hive is a good example, but nobody can deny its importance in spark. 2. ACL is included in the patch, but not in the spec. 3. I understand your question, but the scope of my respond may be too big. To solve this, more work is needed on the entity design. Let's keep an eye on these issues.
          Hide
          stevel@apache.org Steve Loughran added a comment -
          1. I've just tried to see where YARN-2444 stands; I can't replicate it in trunk but I've submitted the tests to verify that it isn't there.
          2. for YARN-2423 Spark seems kind of trapped. It needs an api tagged as public/stable; Robert's patch has the API, except it's being rejected on the basis that "ATSv2 will break it". So it can't be tagged as stable. So there's no API for GET operations until some undefined time t1 > now() —and then, only for Hadoop versions with it. Which implies it won't get picked up by Spark for a long time.

          I think we need to talk to the YARN dev team and see what can be done here. Even if there's no API client bundled into YARN, unless the v1 API and its paths beginning /ws/v1/timeline/ are going to go away, then a REST client is possible; it may just have to be done spark-side, where at least it can be made resilient to hadoop versions.

          Show
          stevel@apache.org Steve Loughran added a comment - I've just tried to see where YARN-2444 stands; I can't replicate it in trunk but I've submitted the tests to verify that it isn't there. for YARN-2423 Spark seems kind of trapped. It needs an api tagged as public/stable; Robert's patch has the API, except it's being rejected on the basis that "ATSv2 will break it". So it can't be tagged as stable. So there's no API for GET operations until some undefined time t1 > now() —and then, only for Hadoop versions with it. Which implies it won't get picked up by Spark for a long time. I think we need to talk to the YARN dev team and see what can be done here. Even if there's no API client bundled into YARN, unless the v1 API and its paths beginning /ws/v1/timeline/ are going to go away, then a REST client is possible; it may just have to be done spark-side, where at least it can be made resilient to hadoop versions.
          Hide
          apachespark Apache Spark added a comment -

          User 'steveloughran' has created a pull request for this issue:
          https://github.com/apache/spark/pull/5423

          Show
          apachespark Apache Spark added a comment - User 'steveloughran' has created a pull request for this issue: https://github.com/apache/spark/pull/5423
          Hide
          stevel@apache.org Steve Loughran added a comment -

          HADOOP-11826 patches the hadoop compatibility document to add timeline server to the list of stable APIs.

          Show
          stevel@apache.org Steve Loughran added a comment - HADOOP-11826 patches the hadoop compatibility document to add timeline server to the list of stable APIs.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          For people who've not been tracking the WiP

          1. the timeline API is pretty thoroughly documented with examples; very close to going in
            Latest TimelineServer.md
          2. the timeline server integration is in sync with trunk, especially the SPARK-4705 changes
          3. it has lots of tests. This includes: generating events from a spark context and verifying that they are served up by an an-VM timeline server instance, and retrievable by a REST client, bringing up a Spark History server and making GET requests against it to verifying it hooks up to the server, and other cross-system tests. That's about as much as you can do in a standalone unit test suite.
          4. those tests all run happily on unix and windows, provided you set the -Phadoop-2.6 -Pyarn flags to request a Hadoop 2.6 profile.
          5. and I've tested against hadoop 2.6.0, 2.7.0 & branch-2; everything compiles and runs

          Can I get some reviews?

          Show
          stevel@apache.org Steve Loughran added a comment - For people who've not been tracking the WiP the timeline API is pretty thoroughly documented with examples; very close to going in Latest TimelineServer.md the timeline server integration is in sync with trunk, especially the SPARK-4705 changes it has lots of tests. This includes: generating events from a spark context and verifying that they are served up by an an-VM timeline server instance, and retrievable by a REST client, bringing up a Spark History server and making GET requests against it to verifying it hooks up to the server, and other cross-system tests. That's about as much as you can do in a standalone unit test suite. those tests all run happily on unix and windows, provided you set the -Phadoop-2.6 -Pyarn flags to request a Hadoop 2.6 profile. and I've tested against hadoop 2.6.0, 2.7.0 & branch-2; everything compiles and runs Can I get some reviews?
          Hide
          stevel@apache.org Steve Loughran added a comment -

          + YARN-3539 is resolved; the v1 timeline is now defined and declared one of the supported REST APIs.

          I'm also removing YARN-2423 as a dependency; the latest patch does this itself

          Show
          stevel@apache.org Steve Loughran added a comment - + YARN-3539 is resolved; the v1 timeline is now defined and declared one of the supported REST APIs. I'm also removing YARN-2423 as a dependency; the latest patch does this itself
          Hide
          apachespark Apache Spark added a comment -

          User 'steveloughran' has created a pull request for this issue:
          https://github.com/apache/spark/pull/8744

          Show
          apachespark Apache Spark added a comment - User 'steveloughran' has created a pull request for this issue: https://github.com/apache/spark/pull/8744
          Hide
          apachespark Apache Spark added a comment -

          User 'steveloughran' has created a pull request for this issue:
          https://github.com/apache/spark/pull/9182

          Show
          apachespark Apache Spark added a comment - User 'steveloughran' has created a pull request for this issue: https://github.com/apache/spark/pull/9182
          Hide
          apachespark Apache Spark added a comment -

          User 'steveloughran' has created a pull request for this issue:
          https://github.com/apache/spark/pull/10545

          Show
          apachespark Apache Spark added a comment - User 'steveloughran' has created a pull request for this issue: https://github.com/apache/spark/pull/10545

            People

            • Assignee:
              Unassigned
              Reporter:
              vanzin Marcelo Vanzin
            • Votes:
              11 Vote for this issue
              Watchers:
              44 Start watching this issue

              Dates

              • Created:
                Updated:

                Development