Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5686

[R] Review R Windows CI build



    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.15.0
    • R


      Followup to ARROW-3758 / https://github.com/apache/arrow/pull/4622. In that, I leveraged the tools in https://github.com/r-windows/rtools-backports to set up CI for Arrow C++ and R on Windows using Appveyor. I was guided mainly by the steps described here on the Arrow project wiki and iterated until I got a passing build.

      Despite getting it to "work", I'm certain I've missed some subtleties, and there may be better ways to accomplish this. Some specific questions:

      • I found that I could ignore rtools-backports/ci-library.sh and most of ci-build.sh because it was oriented around building possibly many packages, but there was a block of pacman stuff I did have to copy here: https://github.com/apache/arrow/pull/4622/files#diff-f4a8bedb9b0d3fe301a84914916f6d49R22. I'm not sure how much these are likely to change, but if that's a concern, maybe that setup could be factored out to a separate shell script in rtools-backports, and the arrow CI could wget and source it like it does some other resources. That way, our setup here wouldn't diverge.
      • I did not understand what I needed to do with rtools-packages, if anything. It seems that it's not used by R yet, so is it just important to have the PKGBUILD in place there for when is ready? If I wanted to build both rtools-backports and rtools-packages builds in the same job, is the difference only these environment variables?
      • The process of taking the appveyor build artifacts, unzipping them, and merging them into the "rwinlib" directory layout seemed loose and poorly defined on the wiki, at least as I could tell. I packaged up the process (as I understood it) in a shell script, and it produced a zip file that is the right shape (right enough that R could install the arrow R package with it and run tests). Does that script make sense? In particular,
        • Is there a good way to keep around the other dependencies (double-conversion, boost, thrift) from when the packages are built so that I don't have to re-download them from bintray? I see that they get pulled down at the beginning of each pkgbuild and then removed after, but I don't know where they are put such that I could keep them around and use them later.
        • Is the lib directory for other dependencies (e.g. libdouble-conversion.a) and lib-4.9.3 for the arrow and parquet binaries we build, as the wiki says? Or is lib for the Rtools4.0/gcc8 versions and lib-4.9.3 for the Rtools3.5/gcc4 versions? 
        • libdouble-conversion.a only seems to exist in the rtools-packages Rtools4.0 packages, but that nevertheless works on the R release version. However, if I used the versions of boost and thrift from the Rtools4.0 bintrays, the R package did not build (link) correctly.

      To be clear, it is not our intention to fork or otherwise avoid the supported Rtools toolchain that is maintained there; rather, we want to continuously integrate arrow to avoid breaking things and make it easier to submit updates to rtools-backports/packages/rwinlib when there's a new arrow release. We want as much as possible to use the supported tools and workflows and are willing to contribute to enhancing them, though we recognize that our needs (as a big C++ library under heavy active development) are probably not shared by many other projects that use rtools-packages et al.



        Issue Links



              npr Neal Richardson
              npr Neal Richardson
              0 Vote for this issue
              3 Start watching this issue



                Time Tracking

                  Original Estimate - Not Specified
                  Not Specified
                  Remaining Estimate - 0h
                  Time Spent - 1h