Bigtop
  1. Bigtop
  2. BIGTOP-1249

Umbrella JIRA: Pre-commit hooks and automated Patch validation

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.9.0
    • Component/s: build
    • Labels:

      Description

      Lets automate some of the review process.

      • It will be good to add in some pre-commit hooks which check for ASF boilerplate on all files and trailing whitespace.
      • It will be also nice if the build server can run "mvn compile" and some other basic validation on the code base every time a patch is submitted, similar to the way apache hadoop is doing.

      I say lets start small: Maybe on the upcoming hack day in denver we can glue the basic JIRA monitoring together with a simple bash script, and iterate from there.

        Issue Links

          Activity

          jay vyas created issue -
          jay vyas made changes -
          Field Original Value New Value
          Description - It will be good to add in some pre-commit hooks which check for ASF boilerplate on all files and trailing whitespace.

          - It will be also nice if the build server can run "mvn compile" and some other basic validation on the code base every time a patch is submitted, similar to the way apache hadoop is doing.

          I say lets start small: Maybe on the upcoming hack day in denver we can glue the basic JIRA monitoring together with a simple bash script, and iterate from there.

          Lets automate some of the review process.

          - It will be good to add in some pre-commit hooks which check for ASF boilerplate on all files and trailing whitespace.

          - It will be also nice if the build server can run "mvn compile" and some other basic validation on the code base every time a patch is submitted, similar to the way apache hadoop is doing.

          I say lets start small: Maybe on the upcoming hack day in denver we can glue the basic JIRA monitoring together with a simple bash script, and iterate from there.

          Hide
          Konstantin Boudnik added a comment -

          For styling, license headers and such we should be able to use Apache RAT, IIRC.
          overall - great idea!

          Show
          Konstantin Boudnik added a comment - For styling, license headers and such we should be able to use Apache RAT, IIRC. overall - great idea!
          Konstantin Boudnik made changes -
          Component/s Build [ 12322208 ]
          Konstantin Boudnik made changes -
          Affects Version/s 0.7.0 [ 12324362 ]
          Konstantin Boudnik made changes -
          Fix Version/s 0.8.0 [ 12324841 ]
          Konstantin Boudnik made changes -
          Labels hackathon
          Konstantin Boudnik made changes -
          Issue Type Bug [ 1 ] Improvement [ 4 ]
          Hide
          jay vyas added a comment -

          thanks cos. So that solves the "how to validate" question. Now, the workflow stuff. how do apache projects normally hook in to evalute JIRA patches ?

          Show
          jay vyas added a comment - thanks cos. So that solves the "how to validate" question. Now, the workflow stuff. how do apache projects normally hook in to evalute JIRA patches ?
          Hide
          Konstantin Boudnik added a comment - - edited

          I believe in case of Hadoop builds there are builds that check the changes in the source code tree and then download/apply patch from a JIRA, and run the tests on the component. E.g https://builds.apache.org/job/PreCommit-HDFS-Build/ that is kicked by https://builds.apache.org/job/PreCommit-Admin/

          The latter is essentially what drives the patches validation.

          Show
          Konstantin Boudnik added a comment - - edited I believe in case of Hadoop builds there are builds that check the changes in the source code tree and then download/apply patch from a JIRA, and run the tests on the component. E.g https://builds.apache.org/job/PreCommit-HDFS-Build/ that is kicked by https://builds.apache.org/job/PreCommit-Admin/ The latter is essentially what drives the patches validation.
          Hide
          jay vyas added a comment -

          Lets also see if we can add some hooks to trigger incomplete JIRA metadata (version, component, ....)

          Show
          jay vyas added a comment - Lets also see if we can add some hooks to trigger incomplete JIRA metadata (version, component, ....)
          jay vyas made changes -
          Link This issue relates to BIGTOP-1286 [ BIGTOP-1286 ]
          Hide
          jay vyas added a comment -

          CI

          Show
          jay vyas added a comment - CI
          Hide
          jay vyas added a comment - - edited

          okay ! After stealing the trick from hadoop-common/dev-utils, we now got a way to automatically push comments to JIRA:

          1) Download the JIRA CLI From https://bobswift.atlassian.net/wiki/display/JCLI/Downloads, and unzip it.

          2) Issue a jira command like this ./jira.sh -s https://issues.apache.org/jira -a addcomment -u JIRA_USERNAME -p PASSWORD --comment "CI" --issue BIGTOP-1249

          This essentially gives us a simple way to run any commands to update comments by auto-evaluating patches in bigtop.

          Now I just need to put up a simple server and create a bigtop-qa user in JIRA to do basica analysis of patches.
          I think to start, I will just put some logic to evaluate patches for trailing whitespaces and confirm that patch -p 0 < BIGTOP-ABCD.patch works.

          Then from there we can iterate to improve.

          3) Also - i need to find a way to poll or trigger events from JIRA. any ideas there?... (UPDATE - a curl command will solve that problem:

          curl -D- -X POST -H "Content-Type: application/json" --data '{"jql":"project = BIGTOP AND updated > -320m AND attachments IS NOT EMPTY AND status = OPEN","startAt":0,"maxResults":10,"fields":["id","key"]}' "https://issues.apache.org/jira/rest/api/2/search"}}  ) 

          And then some simple logic can be used to poll, possibly just monitoring emails (like apache hadoop does), or doing it in a cron..

          Show
          jay vyas added a comment - - edited okay ! After stealing the trick from hadoop-common/dev-utils, we now got a way to automatically push comments to JIRA: 1) Download the JIRA CLI From https://bobswift.atlassian.net/wiki/display/JCLI/Downloads , and unzip it. 2) Issue a jira command like this ./jira.sh -s https://issues.apache.org/jira -a addcomment -u JIRA_USERNAME -p PASSWORD --comment "CI" --issue BIGTOP-1249 This essentially gives us a simple way to run any commands to update comments by auto-evaluating patches in bigtop. Now I just need to put up a simple server and create a bigtop-qa user in JIRA to do basica analysis of patches. I think to start, I will just put some logic to evaluate patches for trailing whitespaces and confirm that patch -p 0 < BIGTOP-ABCD.patch works. Then from there we can iterate to improve. 3) Also - i need to find a way to poll or trigger events from JIRA. any ideas there?... (UPDATE - a curl command will solve that problem: curl -D- -X POST -H "Content-Type: application/json" --data '{"jql":"project = BIGTOP AND updated > -320m AND attachments IS NOT EMPTY AND status = OPEN","startAt":0,"maxResults":10,"fields":["id","key"]}' "https://issues.apache.org/jira/rest/api/2/search"}} ) And then some simple logic can be used to poll, possibly just monitoring emails (like apache hadoop does), or doing it in a cron..
          Hide
          jay vyas added a comment -

          After some more playing, looks like the Java API 'net.rcarz:jira-client:0.4' Is really useful for this: Probably more concise than any other approach. Here is a very raw snippet of how to use it for this sort of thing in scala (im finding a bug with getAttachments, but otherwise, looks like a nice way to monitor JIRAs periodically and super easy to test / maintain.

           
           val creds = new BasicCredentials("jayunit100", "...........")
            val client = new JiraClient("https://issues.apache.org/jira", creds);
            val issueResult = 
              client.
                searchIssues("project=BIGTOP AND status=OPEN AND updated > -1m");
          //        searchIssues("project=BIGTOP AND status=CLOSED AND attachments IS NOT EMPTY");
            
            println(issueResult.total);
            println(issueResult.issues.size());
            (0 to issueResult.issues.size()-1).
                foreach((i:Int) =>
                    println(i + " attachements: " +
                        issueResult.issues.get(i).getKey()+" (key)"+
                        issueResult.issues.get(i).getComponents().size()+" (comp)"+
                        issueResult.issues.get(i).getAttachments().size() + "(attach)"))  
            issueResult.
                issues.
                   foreach((i: Issue) => 
                     i.getAttachments().
                       foreach(
                         (a: Attachment) => 
                           println(i + " DATE : "+ a.getCreatedDate())));
          
          
          Show
          jay vyas added a comment - After some more playing, looks like the Java API 'net.rcarz:jira-client:0.4' Is really useful for this: Probably more concise than any other approach. Here is a very raw snippet of how to use it for this sort of thing in scala (im finding a bug with getAttachments, but otherwise, looks like a nice way to monitor JIRAs periodically and super easy to test / maintain. val creds = new BasicCredentials("jayunit100", "...........") val client = new JiraClient("https://issues.apache.org/jira", creds); val issueResult = client. searchIssues("project=BIGTOP AND status=OPEN AND updated > -1m"); // searchIssues("project=BIGTOP AND status=CLOSED AND attachments IS NOT EMPTY"); println(issueResult.total); println(issueResult.issues.size()); (0 to issueResult.issues.size()-1). foreach((i:Int) => println(i + " attachements: " + issueResult.issues.get(i).getKey()+" (key)"+ issueResult.issues.get(i).getComponents().size()+" (comp)"+ issueResult.issues.get(i).getAttachments().size() + "(attach)")) issueResult. issues. foreach((i: Issue) => i.getAttachments(). foreach( (a: Attachment) => println(i + " DATE : "+ a.getCreatedDate())));
          Hide
          jay vyas added a comment - - edited

          Whats the best way to determine patch application ? At this point, we have 3 different types of patches:

          patch -p0 <: as in BIGTOP-1201
          git am : as in BIGTOP-1221 (patches made with git format-patch)
          patch -p1 <: like in BIGTOP-1316

          Its not clear the best way to determine which patch format is which programmatically. Any ideas here?

          Show
          jay vyas added a comment - - edited Whats the best way to determine patch application ? At this point, we have 3 different types of patches: patch -p0 < : as in BIGTOP-1201 git am : as in BIGTOP-1221 (patches made with git format-patch) patch -p1 < : like in BIGTOP-1316 Its not clear the best way to determine which patch format is which programmatically. Any ideas here?
          Hide
          Mark Grover added a comment -

          I think this is time where we establish a standard, I have been long vouching for people to upload patches as git format-patch.

          In any case, I think patch -p0/-p1 always start with diff --git (I think) and the git am patches have a "From:", "Subject:" etc. tags.

          I think it makes sense to standardize, my vote goes to git format-patch/git am

          Show
          Mark Grover added a comment - I think this is time where we establish a standard, I have been long vouching for people to upload patches as git format-patch. In any case, I think patch -p0/-p1 always start with diff --git (I think) and the git am patches have a "From:", "Subject:" etc. tags. I think it makes sense to standardize, my vote goes to git format-patch/git am
          Hide
          Konstantin Boudnik added a comment -

          Having a standard is great, but it means that any new contributor will have to first read into every single detail on how to contribute and if they won't we'll be forced to reject their patches, etc.
          As I said, standards are great, but they are usually not very flexible. I think checking git apply and - if failed - followed by {{git apply -p0 }}, is a more viable option. Agree?

          Show
          Konstantin Boudnik added a comment - Having a standard is great, but it means that any new contributor will have to first read into every single detail on how to contribute and if they won't we'll be forced to reject their patches, etc. As I said, standards are great, but they are usually not very flexible. I think checking git apply and - if failed - followed by {{git apply -p0 }}, is a more viable option. Agree?
          Hide
          Mark Grover added a comment -

          Fine by me.

          Show
          Mark Grover added a comment - Fine by me.
          Hide
          jay vyas added a comment - - edited

          We all agree we want to spur more contributions.... the question is wether the benefits of rigorous use of git format-patch + git --signoff spurs more contributions or makes it harder to contribute.

          I think it spurs more contributions : here is why.

          (1) it helps contributors by forcing them to get attributed for their work because git puts your name and data in the commit log and also .

          (2) makes it easier to validate patches. that makes it easier for contributors because there is a clear deterministic way to evaluate their patches.

          To Cos' point, it does make it harder for some people ... But I think the amount of people a standard *benefits* is >> then the amount it hinders.

          So how bout a comprimise - we agree that git format-patch is the *right* way to submit patches, we all agree to encourage it because of its obvious benefits - but commiters can choose to accept and review other patches if they want to... and maybe in time we can re-visit wether to strictly enforce a standard or not at a later date.

          ( Oh and yes – I think for the automated review, that double check of git apply followed by -p0 is just fine, as we discussed a couple days ago )

          Show
          jay vyas added a comment - - edited We all agree we want to spur more contributions.... the question is wether the benefits of rigorous use of git format-patch + git --signoff spurs more contributions or makes it harder to contribute. I think it spurs more contributions : here is why. (1) it helps contributors by forcing them to get attributed for their work because git puts your name and data in the commit log and also . (2) makes it easier to validate patches. that makes it easier for contributors because there is a clear deterministic way to evaluate their patches. To Cos' point, it does make it harder for some people ... But I think the amount of people a standard * benefits * is >> then the amount it hinders. So how bout a comprimise - we agree that git format-patch is the * right * way to submit patches, we all agree to encourage it because of its obvious benefits - but commiters can choose to accept and review other patches if they want to... and maybe in time we can re-visit wether to strictly enforce a standard or not at a later date. ( Oh and yes – I think for the automated review, that double check of git apply followed by -p0 is just fine, as we discussed a couple days ago )
          Hide
          Mark Grover added a comment -

          +1, exactly my thoughts.

          Show
          Mark Grover added a comment - +1, exactly my thoughts.
          Hide
          Konstantin Boudnik added a comment -

          A couple of the points to think about:

          • git format-patch exist solely for the purpose of sending patches over email. We don't do it.
          • the reason why I am using git diff --no-prefix is because it's about 20 times fast for me to double click on a file name inside of the patch and then paste it with a middle mouse click elsewhere. E.g. I am might want to look into that file or whatever. If a patch is formatted with prefixes, i.e. has these annoying 'a/' and 'b/' things - it makes my workflow painfully slow: now I need to jump around the string to remove the prefix. Any while I am very proficient with command line and bash this little things slows everything into molasses.

          I understand that all people are different: some are never learned Meta-B or ^-A and won't even for the sake of their lives. And that's ok by me. However, I don't want to be deliberately slowed down to a slowest common denominator.

          Does it sound heartless? Perhaps. But I don't care - I have a very limited time on this planet and I want to do what computers can't. And let them to take care about mechanical, mundane, and honestly boring stuff like detecting patch formats. So, I am strongly against the imposition of any patch formatting restrictions. And I don't by that having a standard patch format will ease the review process. If patch doesn't apply - noone will review it. If it does - who cares how it was formatted as far as the code is right?

          Show
          Konstantin Boudnik added a comment - A couple of the points to think about: git format-patch exist solely for the purpose of sending patches over email. We don't do it. the reason why I am using git diff --no-prefix is because it's about 20 times fast for me to double click on a file name inside of the patch and then paste it with a middle mouse click elsewhere. E.g. I am might want to look into that file or whatever. If a patch is formatted with prefixes, i.e. has these annoying 'a/' and 'b/' things - it makes my workflow painfully slow: now I need to jump around the string to remove the prefix. Any while I am very proficient with command line and bash this little things slows everything into molasses. I understand that all people are different: some are never learned Meta-B or ^-A and won't even for the sake of their lives. And that's ok by me. However, I don't want to be deliberately slowed down to a slowest common denominator. Does it sound heartless? Perhaps. But I don't care - I have a very limited time on this planet and I want to do what computers can't. And let them to take care about mechanical, mundane, and honestly boring stuff like detecting patch formats. So, I am strongly against the imposition of any patch formatting restrictions. And I don't by that having a standard patch format will ease the review process. If patch doesn't apply - noone will review it. If it does - who cares how it was formatted as far as the code is right?
          Hide
          jay vyas added a comment - - edited

          I guess you have a point - let people contribute however they want to. We don't want to slow down progress and there's tons of work to be done.

          But i must say – as an aside – hadoop was created for web-crawling... certainly it has broader uses nowadays... in the same sense the original motivation for git format-patch shouldn't tarnish its bright future

          Show
          jay vyas added a comment - - edited I guess you have a point - let people contribute however they want to. We don't want to slow down progress and there's tons of work to be done. But i must say – as an aside – hadoop was created for web-crawling... certainly it has broader uses nowadays... in the same sense the original motivation for git format-patch shouldn't tarnish its bright future
          Hide
          Roman Shaposhnik added a comment -

          Konstantin Boudnik and jay vyas I certainly agree with Cos' metapoint of optimizing for the ease of contribution. And to that point – I don't care what patch format committers use to post patches – all I do with them is review. I do care a great deal what format contributors post patches in. Why? Because I want it to be as quick as possible for me to commit their work to git in such a way that preserves ownership of patches. And git patch-format seems like the only game in town for that to be automated.

          Show
          Roman Shaposhnik added a comment - Konstantin Boudnik and jay vyas I certainly agree with Cos' metapoint of optimizing for the ease of contribution. And to that point – I don't care what patch format committers use to post patches – all I do with them is review. I do care a great deal what format contributors post patches in. Why? Because I want it to be as quick as possible for me to commit their work to git in such a way that preserves ownership of patches. And git patch-format seems like the only game in town for that to be automated.
          Hide
          Konstantin Boudnik added a comment -

          that preserves ownership of patches.

          for that we always were using --author option, weren't we?

          Show
          Konstantin Boudnik added a comment - that preserves ownership of patches. for that we always were using --author option, weren't we?
          Hide
          Roman Shaposhnik added a comment -

          Konstantin Boudnik you could, but that would be inconvenient compared to simply git am'ing the patch.

          Show
          Roman Shaposhnik added a comment - Konstantin Boudnik you could, but that would be inconvenient compared to simply git am'ing the patch.
          Hide
          Konstantin Boudnik added a comment - - edited

          True. With that, shall the proposed formatting be a recommendation, not the limiting factor on the contributions?

          Show
          Konstantin Boudnik added a comment - - edited True. With that, shall the proposed formatting be a recommendation, not the limiting factor on the contributions?
          Konstantin Boudnik made changes -
          Fix Version/s 0.9.0 [ 12326836 ]
          Fix Version/s 0.8.0 [ 12324841 ]
          Hide
          Konstantin Boudnik added a comment -

          BTW, there's a sorta working Precommit jobs on builds.apache.org. Here how it works

          The implementation reminds me of a contraption a little bit, but the idea is good: just use a filter to select project's ticket in "Patch Available" state and then do what needs to be done. Incidentally, I was playing with this idea of Ignite CI and this piece of Groovy seems be to be doing just what's needed:

          /**
           * Parsing a special filter from an Apache project JIRA and picking up latest by ID
           * attachments to process.
           */
          final ATTACHMENT_URL = "https://issues.apache.org/jira/secure/attachment"
          def jirasAttached = [:]
          def JIRA_FILTER =
              "https://issues.apache.org/jira/sr/jira.issueviews:searchrequest-xml/12330308/SearchRequest-12330308.xml?tempMax=100&field=key&field=attachments"
          def rss = new XmlSlurper().parse(JIRA_FILTER)
          rss.channel.item.each { jira ->
            println jira.key
            def latestAttr = jira.attachments[0].attachment.list().sort {
              it.@id.toInteger()
            }.reverse()[0]
            jirasAttached[jira.key] = "${latestAttr.@id}/${latestAttr.@name}"
          }
          

          given, that you need to provide your own filter.

          Show
          Konstantin Boudnik added a comment - BTW, there's a sorta working Precommit jobs on builds.apache.org. Here how it works The implementation reminds me of a contraption a little bit, but the idea is good: just use a filter to select project's ticket in "Patch Available" state and then do what needs to be done. Incidentally, I was playing with this idea of Ignite CI and this piece of Groovy seems be to be doing just what's needed: /** * Parsing a special filter from an Apache project JIRA and picking up latest by ID * attachments to process. */ final ATTACHMENT_URL = "https: //issues.apache.org/jira/secure/attachment" def jirasAttached = [:] def JIRA_FILTER = "https: //issues.apache.org/jira/sr/jira.issueviews:searchrequest-xml/12330308/SearchRequest-12330308.xml?tempMax=100&field=key&field=attachments" def rss = new XmlSlurper().parse(JIRA_FILTER) rss.channel.item.each { jira -> println jira.key def latestAttr = jira.attachments[0].attachment.list().sort { it.@id.toInteger() }.reverse()[0] jirasAttached[jira.key] = "${latestAttr.@id}/${latestAttr.@name}" } given, that you need to provide your own filter.

            People

            • Assignee:
              Unassigned
              Reporter:
              jay vyas
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development