Infrastructure
  1. Infrastructure
  2. INFRA-5064

bloodhound issue tracker access to local svn repository

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: Initial Clearing
    • Component/s: Subversion
    • Labels:
      None

      Description

      I am currently looking for possible solutions for the fact that Bloodhound requires that it runs on the same server that any of the repositories that it tracks are located.

      I don't imagine for a second that we would be allowed to run bloodhound on the main apache svn server and so I suspect that we are looking at whether we would be allowed to have a read only mirror of the bloodhound area of the apache repositories using svnsync (possibly making use of svnrdump to get the relevant part of the repository).

      Is this a feasible and sustainable solution with bloodhound currently running on bloodhound-vm.apache.org or are there any other suggestions?

      Many thanks,
          Gary

        Activity

        Hide
        Tony Stevenson added a comment -
        This has finally been done. You can now find a r/o nfs mount of the main asf svn repo at /x1/svn/asf

        Enjoy.

        Apologies for the monumental amount of delay, which has only been equalled by the monumental amounts of pain in making it work. :)
        Show
        Tony Stevenson added a comment - This has finally been done. You can now find a r/o nfs mount of the main asf svn repo at /x1/svn/asf Enjoy. Apologies for the monumental amount of delay, which has only been equalled by the monumental amounts of pain in making it work. :)
        Hide
        Tony Stevenson added a comment -
        Sorry, there is still an underlying issue in getting the storage sorted. I'll get to it as soon as I can.
        Show
        Tony Stevenson added a comment - Sorry, there is still an underlying issue in getting the storage sorted. I'll get to it as soon as I can.
        Hide
        Branko Čibej added a comment - - edited
        I just reread Tony's comment about "you cannot use svnsync to sync a slice of a repo" and have to point out that nothing could be farther from the truth ... all we have to do is wait for the svn-1.8 release :) Then the easiest way to do this would be to use svnsync to mirror just the BH trunk, most likely triggering the sync with svnpubsub on the target VM.
        Show
        Branko Čibej added a comment - - edited I just reread Tony's comment about "you cannot use svnsync to sync a slice of a repo" and have to point out that nothing could be farther from the truth ... all we have to do is wait for the svn-1.8 release :) Then the easiest way to do this would be to use svnsync to mirror just the BH trunk, most likely triggering the sync with svnpubsub on the target VM.
        Hide
        Gary Martin added a comment -
        Just raising the priority as it appears to be too low. This issue has not seen any progress for a long time and it may even be worth raising the priority even higher.
        Show
        Gary Martin added a comment - Just raising the priority as it appears to be too low. This issue has not seen any progress for a long time and it may even be worth raising the priority even higher.
        Hide
        Gary Martin added a comment -
        Are there any more questions that need to be answered at this point? Is the solution mentioned by Greg back in August likely to be implemented?

        Cheers,
            Gary
        Show
        Gary Martin added a comment - Are there any more questions that need to be answered at this point? Is the solution mentioned by Greg back in August likely to be implemented? Cheers,     Gary
        Hide
        Dave Brondsema added a comment -
        NFS is fine for Allura too. Allura initially only needs read access to its own git repo, so SVN isn't necessary for us at this point.
        Show
        Dave Brondsema added a comment - NFS is fine for Allura too. Allura initially only needs read access to its own git repo, so SVN isn't necessary for us at this point.
        Hide
        Gary Martin added a comment -
        Perhaps I should have said this earlier. I think that the NFS solution looks like a very good idea. There may be challenges in how to trigger Bloodhound to update without direct post-commit-hook but there should be a number of ways to do the job.

        Does the NFS solution look good to Infra?
        Show
        Gary Martin added a comment - Perhaps I should have said this earlier. I think that the NFS solution looks like a very good idea. There may be challenges in how to trigger Bloodhound to update without direct post-commit-hook but there should be a number of ways to do the job. Does the NFS solution look good to Infra?
        Hide
        Greg Stein added a comment -
        For Infra to design a way for BH and Allura to locally access the svn (and git) repositories.

        As we talked about on IRC, one solution would be to mirror the entire ASF repository(-ies) to a box (I forget its name; it has "lots of disk space", somebody noted). The mirror would then be provided to the BH and Allura VMs, which are running on the same box, via local NFS serving/mounts.
        Show
        Greg Stein added a comment - For Infra to design a way for BH and Allura to locally access the svn (and git) repositories. As we talked about on IRC, one solution would be to mirror the entire ASF repository(-ies) to a box (I forget its name; it has "lots of disk space", somebody noted). The mirror would then be provided to the BH and Allura VMs, which are running on the same box, via local NFS serving/mounts.
        Hide
        #asfinfra IRC Bot added a comment -
        <danielsh> ping. What needs to be done or what questions need to be answered, before this ticket can be closed?
        Show
        #asfinfra IRC Bot added a comment - <danielsh> ping. What needs to be done or what questions need to be answered, before this ticket can be closed?
        Hide
        Greg Stein added a comment -
        A preload of empty revisions is a very good idea. And then copy over "real" revision data starting with the first BH commit.

        I'd also like to note that Allura (also incubating) is going to need a similar solution. I'll point allura-dev at this ticket.
        Show
        Greg Stein added a comment - A preload of empty revisions is a very good idea. And then copy over "real" revision data starting with the first BH commit. I'd also like to note that Allura (also incubating) is going to need a similar solution. I'll point allura-dev at this ticket.
        Hide
        Gary Martin added a comment -
        Thanks for that Greg. I was going to respond but JIRA decided to reindex when I submitted, or something like that.

        The basic advice I have had was that we could use svnrdump to transfer the content (and, as we all seem to agree, restrict the scope to a directory and to only a diff per revision) and then pipe this into an svnadmin load. I haven't looked at how this should be triggered but I am sure you have a standard solution for that.

        I was also told that svnrdump should not be used with serf at the moment.

        Is there a sensible solution for initial loading of the data prior to the first relevant bloodhound commit or should we pipe 1229642 empty revisions into svnadmin load through a script? I am testing something like that out at the moment and, running with libeatmydata, it seems to be managing around 700,000 per hour.
        Show
        Gary Martin added a comment - Thanks for that Greg. I was going to respond but JIRA decided to reindex when I submitted, or something like that. The basic advice I have had was that we could use svnrdump to transfer the content (and, as we all seem to agree, restrict the scope to a directory and to only a diff per revision) and then pipe this into an svnadmin load. I haven't looked at how this should be triggered but I am sure you have a standard solution for that. I was also told that svnrdump should not be used with serf at the moment. Is there a sensible solution for initial loading of the data prior to the first relevant bloodhound commit or should we pipe 1229642 empty revisions into svnadmin load through a script? I am testing something like that out at the moment and, running with libeatmydata, it seems to be managing around 700,000 per hour.
        Hide
        Greg Stein added a comment -
        svnrdump can restrict the content placed into a dump file to a specific subdirectory. All revisions and revision properties will be carried over (even tho they might be devoid of changes), so the revision numbering should stay consistent.

        This means the target repository will still have 1.3 million revisions, but the actual content will be MUCH smaller.

        Note that svn's FSFS storage system uses a single file per revision to store the revision properties (svn 1.8-dev has a fix for this, iirc). The actual revisions can be packed more tightly. Daniel would know more about this. If inodes are a problem, then a BDB backend may be more appropriate.
        Show
        Greg Stein added a comment - svnrdump can restrict the content placed into a dump file to a specific subdirectory. All revisions and revision properties will be carried over (even tho they might be devoid of changes), so the revision numbering should stay consistent. This means the target repository will still have 1.3 million revisions, but the actual content will be MUCH smaller. Note that svn's FSFS storage system uses a single file per revision to store the revision properties (svn 1.8-dev has a fix for this, iirc). The actual revisions can be packed more tightly. Daniel would know more about this. If inodes are a problem, then a BDB backend may be more appropriate.
        Hide
        Tony Stevenson added a comment -
        Gary, ping ?
        Show
        Tony Stevenson added a comment - Gary, ping ?
        Hide
        #asfinfra IRC Bot added a comment -
        <pctony> Gary, you let us know what your preference is. Then we can implement it.
        Show
        #asfinfra IRC Bot added a comment - <pctony> Gary, you let us know what your preference is. Then we can implement it.
        Hide
        Gary Martin added a comment -
        Well, if I remember correctly, svnsync should be able to be restricted to a branch. However, svnrdump might provide an interesting alternative approach. In addition to being able to specify a part of the repository to dump, the --incremental switch changes the dump to provide a diff relative to the previous revision instead of the complete expansion of each revision.

        Assuming we can get this to work, does this begin to sound better?

        I can see the advantage to making plans to stop the activity if we can work out a foolproof method. I would like to be minimising the risk that we would have to implement those plans though!

        I am trying to find out more about the possible pitfalls from colleagues with more expertise in this area.

        Cheers,
            Gary
        Show
        Gary Martin added a comment - Well, if I remember correctly, svnsync should be able to be restricted to a branch. However, svnrdump might provide an interesting alternative approach. In addition to being able to specify a part of the repository to dump, the --incremental switch changes the dump to provide a diff relative to the previous revision instead of the complete expansion of each revision. Assuming we can get this to work, does this begin to sound better? I can see the advantage to making plans to stop the activity if we can work out a foolproof method. I would like to be minimising the risk that we would have to implement those plans though! I am trying to find out more about the possible pitfalls from colleagues with more expertise in this area. Cheers,     Gary
        Hide
        Tony Stevenson added a comment -
        Gary,

        I do have a concern about IOPS yes. You're running your bloodhound service on a VM. This has a shared access model to the storage along with about another 25 VMs. If you kill the IO the other VMs will suffer/fail too; so I hope you can understand my reticence about just going down the TIAS and see pray. That said I dont yet have a better idea to enable you.

        Do you think you can get svnsync to DTRT and only play with the bloodhound namespace? If so, I suggest we do run a TIAS trial for a week or two and see how we get on. However, please update this jira ticket, and infrastructure-private with a method to stop the activity should it overwhelm something. Otherwise our only alternative is to shutdown the VM.

        Cheers,
        Tony
        Show
        Tony Stevenson added a comment - Gary, I do have a concern about IOPS yes. You're running your bloodhound service on a VM. This has a shared access model to the storage along with about another 25 VMs. If you kill the IO the other VMs will suffer/fail too; so I hope you can understand my reticence about just going down the TIAS and see pray. That said I dont yet have a better idea to enable you. Do you think you can get svnsync to DTRT and only play with the bloodhound namespace? If so, I suggest we do run a TIAS trial for a week or two and see how we get on. However, please update this jira ticket, and infrastructure-private with a method to stop the activity should it overwhelm something. Otherwise our only alternative is to shutdown the VM. Cheers, Tony
        Hide
        #asfinfra IRC Bot added a comment -
        <danielsh> 'svn log -q | svnrdump'? Even if that works, it'll break the revision numbers.
        Show
        #asfinfra IRC Bot added a comment - <danielsh> 'svn log -q | svnrdump'? Even if that works, it'll break the revision numbers.
        Hide
        Gary Martin added a comment -
        Well, I certainly do not mind your assumption that I do not know all that much about svnsync. The fact is that it is not my area of expertise.

        I certainly did not think it would be a good idea to get the full history of the entire repo. I thought that we might be able to find a way of restricting any local copy to the project directories. In addition, I got the impression (though I haven't tested it) that the relatively new svnrdump command might give the opportunity to build the copy from a later revision.

        If this is still likely to be unreasonable with respect to disk IOPS then I clearly have to find other alternatives.

        I am not sure if this is any better but I seem to remember seeing something about git mirrors of apache projects. Perhaps that might provide an alternative route.

        Cheers,
            Gary
        Show
        Gary Martin added a comment - Well, I certainly do not mind your assumption that I do not know all that much about svnsync. The fact is that it is not my area of expertise. I certainly did not think it would be a good idea to get the full history of the entire repo. I thought that we might be able to find a way of restricting any local copy to the project directories. In addition, I got the impression (though I haven't tested it) that the relatively new svnrdump command might give the opportunity to build the copy from a later revision. If this is still likely to be unreasonable with respect to disk IOPS then I clearly have to find other alternatives. I am not sure if this is any better but I seem to remember seeing something about git mirrors of apache projects. Perhaps that might provide an alternative route. Cheers,     Gary
        Hide
        Tony Stevenson added a comment -
        So I am informed you guys already know enough about how svnsync works, so apologies for that. Didn't put two and two together.
        My original issue with disk IOPS still stands.
        Show
        Tony Stevenson added a comment - So I am informed you guys already know enough about how svnsync works, so apologies for that. Didn't put two and two together. My original issue with disk IOPS still stands.
        Hide
        Tony Stevenson added a comment -
        Gary, so unless you come up with an alternative, I suspect we may have to say no on this occasion. I'll leave the ticket open for a week or so, and left in this state. If we dont hear anything back I'll go ahead and close it, but we can always re-open if we have a reason too.
        Show
        Tony Stevenson added a comment - Gary, so unless you come up with an alternative, I suspect we may have to say no on this occasion. I'll leave the ticket open for a week or so, and left in this state. If we dont hear anything back I'll go ahead and close it, but we can always re-open if we have a reason too.
        Hide
        Tony Stevenson added a comment -
        Gary,

        AIUI you cannot use svnsync to sync a slice of a repo, it operates at the repo level only. This means you would need to sync the entire ASF main repo, and at 1.36 million revs, and currently sits at 152GB native on disk, this is not something we will be looking at doing.

        Does it need to be a repo? Would a WC not suffice? Perhaps even I could suggest that this constraint of a local repo is not such a hot one. I mean, other ticketing systems tend to use the publicly available dav data. While I appreciate that a lot of these (including the JIRA instance of such) are really quite bad.

        We just dont have the disk I/O available to even consider letting you sync the repo, while disk space is more freely available for us now, the IOPS would kill the host your VM sits on.
        Show
        Tony Stevenson added a comment - Gary, AIUI you cannot use svnsync to sync a slice of a repo, it operates at the repo level only. This means you would need to sync the entire ASF main repo, and at 1.36 million revs, and currently sits at 152GB native on disk, this is not something we will be looking at doing. Does it need to be a repo? Would a WC not suffice? Perhaps even I could suggest that this constraint of a local repo is not such a hot one. I mean, other ticketing systems tend to use the publicly available dav data. While I appreciate that a lot of these (including the JIRA instance of such) are really quite bad. We just dont have the disk I/O available to even consider letting you sync the repo, while disk space is more freely available for us now, the IOPS would kill the host your VM sits on.

          People

          • Assignee:
            Tony Stevenson
            Reporter:
            Gary Martin
          • Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development