Uploaded image for project: 'Infrastructure'
  1. Infrastructure
  2. INFRA-19439

Add a way to publish static HTML content of huge size (outside of pelican CMS) without checking into Git

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: None
    • Component/s: Other/Misc
    • Environment:
      Lucene Webseite using Pelican
    • Project:
      Lucene

      Description

      Hi,
      we are in progress of converting our website from the old CMS to pelican. The CMS part is already done (see LUCENE-8987). The results are looking fine:
      https://lucene.staged.apache.org

      But there is still an open point: The website uses the extpath functionality to publish the documentation (markdown+Javadocs with hundreds of files) and Solr Refguide for each released version on the website.

      Previously we were uploading those files to the production tree in SVN and referenced them in extpath of the old CMS. The total size of the files are about 16 to 20 GiB, consisting of hundreds/thousands of files (static HTML). The files are pushed exactly one time to a new folder (including version number) and never ever changed (as it's somehow a release of documentation, so its static), the committers were using sparse SVN checkouts for that.

      Of course we could commit those 16 Gigs of small files to Git but that won't scale at all. Also building the webseite would take very long (as Pelican does not work incrementially - please correct me if I am wrong).

      I was talking with [~ke4qqq] on Apachecon Europe about this. He said, that the CMS should only contain the markdown CMS files (so the dynamic website). Static content can still be committed to subversion (where you also have the cool functionality of sparse checkouts). With Git you have to copy everything.

      The same approach is used for distributions (tar.gz), so we'd like to have the static HTML stuff (Javadocs, Refguide) still in SVN and would like to link them using .htaccess or similar into the main web page: e.g., http://lucene.apache.org/core/8_3_0/ (this is static content), while http://lucene.apache.org/core/ is build by CMS. So we'd like to host the stuff in another subdirectory using svnpubsub and add "Alias" directives for each released version (or maybe using a regex rewrite). The same applies for Solr and its refguide: http://lucene.apache.org/solr/guide/8_3/ (everything below /solr/guide).

      What options do we have for that? Should we keep the current SVN for that maybe just move the stuff somewhere else in the SVN tree after switch to Pelican.

      We may move the stuff to another subdirectory so the Alias deployment works better and add permanent redirects. It's important that the old URLs are stable, because the release TARs refer to them, too.

      I heard of other projects having the same problem: Maven with those websites for each relaese, generated from their build system; OpenOffice

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                humbedooh Daniel Gruno
                Reporter:
                uschindler Uwe Schindler
              • Votes:
                1 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m