Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12236

[R][CI] Add check that all docs pages are listed in _pkgdown.yml




      Our (external) nightly R packaging and docs build is failing to render the pkgdown site: https://github.com/ursa-labs/arrow-r-nightly/runs/2266551062?check_suite_focus=true#step:9:55

      This is due to (1) a new-ish change in pkgdown that errors if topics are not included and (2) the recent addition of FragmentScanOptions, which did not get added to _pkgdown.yml.

      We should validate this on our regular CI in order to prevent future issues like this. We often have to add things to _pkgdown.yml right at release time, and it would be better to keep up as we go. Some ideas for how:

      • Add a step to an existing R workflow (e.g. https://github.com/apache/arrow/blob/master/.github/workflows/r.yml#L60) that does this check
      • Add a new workflow that is triggered only on changes to `r/man` and `r/_pkgdown.yml`
      • In either case, this could be done as a bash script, a python script, or an R script. If using R, note that the docker-based CI jobs won't have R installed, so you might want to tack it onto one of the windows jobs (which uses the setup-r action), but then you're in windows.
      • You could install pkgdown and try to build the site, but that's a lot of dependency to download and install just to essentially compare some lines in a yaml file with a directory listing (i.e., make sure that all r/man/*.Rd have corresponding entries in the reference part of the yml), so python or even a bash script might be more efficient to run. And since this is going to run a lot, it's worth considering how to keep runtime down even if that means more work to set it up.
      • If you're scripting this standalone, think you'll need to filter out Rd files that have {{\keyword {internal}

        }} as pkgdown excludes those from the reference list.


        Issue Links



              thisisnic Nicola Crane
              npr Neal Richardson
              0 Vote for this issue
              2 Start watching this issue



                Time Tracking

                  Original Estimate - Not Specified
                  Not Specified
                  Remaining Estimate - 0h
                  Time Spent - 50m