Lucene - Core
  1. Lucene - Core
  2. LUCENE-3945

we should include checksums for every jar ivy fetches in svn & src releases to verify the jars are the ones we expect

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Conversation with rmuir last night got me thinking about the fact that one thing we lose by using ivy is confidence that every user of a release is compiling against (and likely using at run time) the same dependencies as every other user.

      Up to 3.5, users of src and binary releases could be confident that the jars included in the release were the same jars the lucene devs vetted and tested against when voting on the release candidate, but with ivy there is now the possibility that after the source release is published, the owner of a domain where these dependencies are hosted might change the jars in some way w/o anyone knowing. Likewise: we as developers could commit an ivy.xml file pointing to a specific URL which we then use for and test for months, and just prior to a release, the contents of the remote URL could change such that a JAR included in the binary artifacts might not match the ones we've vetted and tested leading up to that RC.

      So i propose that we include checksum files in svn and in our source releases that can be used by users to verify that the jars they get from ivy match the jars we tested against.

      1. LUCENE-3945_trunk_jar_sha1.patch
        40 kB
        Hoss Man
      2. LUCENE-3945_trunk_jar_sha1.patch
        38 kB
        Hoss Man
      3. LUCENE-3945_trunk_jar_sha1.patch
        34 kB
        Hoss Man
      4. LUCENE-3945.patch
        5 kB
        Hoss Man
      5. LUCENE-3945.patch
        5 kB
        Hoss Man
      6. LUCENE-3945.patch
        4 kB
        Hoss Man

        Activity

        Hide
        Robert Muir added a comment -

        Backported to 3.x

        Thanks Hoss!

        Show
        Robert Muir added a comment - Backported to 3.x Thanks Hoss!
        Hide
        Hoss Man added a comment -

        Committed revision 1309503. - trunk

        rmuir said on irc that he'd work on backporting to 3x for me (going to grab some lunch soon and then get on a plane)

        Show
        Hoss Man added a comment - Committed revision 1309503. - trunk rmuir said on irc that he'd work on backporting to 3x for me (going to grab some lunch soon and then get on a plane)
        Hide
        Robert Muir added a comment -

        +1

        I opened LUCENE-3952 and we can later look at an other similar minor
        improvements related to that on other issues (like compile-tools,
        called before validate, likely tries to use the ant.jar)...

        some of that we might just have to fix in a later release, at least
        it wont silently work wrong (worst case you get a zip error).

        Show
        Robert Muir added a comment - +1 I opened LUCENE-3952 and we can later look at an other similar minor improvements related to that on other issues (like compile-tools, called before validate, likely tries to use the ant.jar)... some of that we might just have to fix in a later release, at least it wont silently work wrong (worst case you get a zip error).
        Hide
        Hoss Man added a comment -

        1) fixes a few places ha1 files were still getting used in solr contrib test classpaths
        2) fixes the binary releases to exlcude the sha1 files
        3) verified that the src releases should work fine (they do an svn export so once the files are commited we should be good)
        4) package-local-src-tgz currently includes the sha1 files (but it also currently includes the jars if you don't do clean-jar first ... not really something i'm going to worry about here)

        I think we're good to go?

        Show
        Hoss Man added a comment - 1) fixes a few places ha1 files were still getting used in solr contrib test classpaths 2) fixes the binary releases to exlcude the sha1 files 3) verified that the src releases should work fine (they do an svn export so once the files are commited we should be good) 4) package-local-src-tgz currently includes the sha1 files (but it also currently includes the jars if you don't do clean-jar first ... not really something i'm going to worry about here) I think we're good to go?
        Hide
        Hoss Man added a comment -

        revised patch, switches the file names to sha1, and adds a "common.classpath.excludes" property thta i started using everywhere i could find that made sense.

        I've verified that this keeps those sha1 files out of the solr war, and so far it looks good for all the tests ... would be helpful if someone with a faster computer could sanity check that (Note: test won't fail if the sha1s are in the classpath – you'll just get a ZipException in the console that you have to grep for)

        Show
        Hoss Man added a comment - revised patch, switches the file names to sha1, and adds a "common.classpath.excludes" property thta i started using everywhere i could find that made sense. I've verified that this keeps those sha1 files out of the solr war, and so far it looks good for all the tests ... would be helpful if someone with a faster computer could sanity check that (Note: test won't fail if the sha1s are in the classpath – you'll just get a ZipException in the console that you have to grep for)
        Hide
        Robert Muir added a comment -

        and yeah, we need to test and likely modify the binary dist patterns as you hinted at.

        Show
        Robert Muir added a comment - and yeah, we need to test and likely modify the binary dist patterns as you hinted at.
        Hide
        Robert Muir added a comment -

        well ideally i think it would just be .sha1 (lowercase), this matches the extension used
        for signing our release artifacts...

        The reason for exclusion versus inclusion is in the case the jars don't actually exist yet,
        its important!

        but in lucene/common-build.xml i think we should just set up a single libexcludes ant property or something:

        **/*.txt,**/*.template,**/*.sha1
        

        and just use that for these filesets

        fileset dir="foo" excludes="${libexcludes}"
        

        or whatever. and yeah libexcludes is a crappy name, its just the idea.

        Show
        Robert Muir added a comment - well ideally i think it would just be .sha1 (lowercase), this matches the extension used for signing our release artifacts... The reason for exclusion versus inclusion is in the case the jars don't actually exist yet, its important! but in lucene/common-build.xml i think we should just set up a single libexcludes ant property or something: **/*.txt,**/*.template,**/*.sha1 and just use that for these filesets fileset dir="foo" excludes="${libexcludes}" or whatever. and yeah libexcludes is a crappy name, its just the idea.
        Hide
        Hoss Man added a comment -

        previous patch with the addition of SHA1 files for all jars currently used on trunk.

        SHA files were generated using...

        ant clean resolve 
        find -name \*.jar | xargs -L 1 sha1sum | awk '{ system("echo " $1 ">> " $2 ".SHA1") }'
        find -name \*.SHA1 | xargs svn add
        

        all tests pass these jars, but one thing i notice is that the classpath logic in most places seems to be based on exclusion instead of inclusion (ie: 'excludes=".txt,.template"' instead of 'includes="*.jar"') so with these SHA1 files added, we now get a metric shit ton of "ZipException: error in opening zip file" warnints logged by junit because it's trying to parse these files as jars

        should we fix all these classpaths, or rename the files "*.SHA1.txt" ?

        (note: i haven't even looked at the packaging yet to verify if the SHA1 files are being included in the src builds and excluded from the bin builds)

        Show
        Hoss Man added a comment - previous patch with the addition of SHA1 files for all jars currently used on trunk. SHA files were generated using... ant clean resolve find -name \*.jar | xargs -L 1 sha1sum | awk '{ system("echo " $1 ">> " $2 ".SHA1") }' find -name \*.SHA1 | xargs svn add all tests pass these jars, but one thing i notice is that the classpath logic in most places seems to be based on exclusion instead of inclusion (ie: 'excludes=" .txt, .template"' instead of 'includes="*.jar"') so with these SHA1 files added, we now get a metric shit ton of "ZipException: error in opening zip file" warnints logged by junit because it's trying to parse these files as jars should we fix all these classpaths, or rename the files "*.SHA1.txt" ? (note: i haven't even looked at the packaging yet to verify if the SHA1 files are being included in the src builds and excluded from the bin builds)
        Hide
        Robert Muir added a comment -

        Haven't tested but looks good: I also think its a good idea to keep the old name for now, we can rename later.

        unrelated but couldnt help but notice, this file has no license header. This is because the current rat tasks
        ignore tools/....grrrrr

        Show
        Robert Muir added a comment - Haven't tested but looks good: I also think its a good idea to keep the old name for now, we can rename later. unrelated but couldnt help but notice, this file has no license header. This is because the current rat tasks ignore tools/....grrrrr
        Hide
        Hoss Man added a comment -

        bah ... previous patches were broken in that they wouldn't actaully fail the build when checksum failures were detected.

        Show
        Hoss Man added a comment - bah ... previous patches were broken in that they wouldn't actaully fail the build when checksum failures were detected.
        Hide
        Hoss Man added a comment -

        Isn't any of these simpler?

        FWIW that code was basically verbatim from ant's Checksum.java, but yeah: your way is better

        Isn't this locale-sensitive? I think it should be explicit UTF-8

        Also verbatim, but it actually caught my eye and I thought it was intentional to deal with line ending differences in diff OSes, but your comment makes me realize thta makes no sense – and BufferedReader chomps all possible endings equally regardless of locale/encoding.

        we can just name it DependencyCheckTask.

        I think in the long run we should do that, and rename the taskdef, and rename the macrodef, ... but for 3.6 it might be better to keep the changes to a minimum and just add the logic to the existing class and add the SHA1 files but not rename anything.

        ...updated patch with those changes ... going to start compiling the list of all SHA1s files on trunk

        Show
        Hoss Man added a comment - Isn't any of these simpler? FWIW that code was basically verbatim from ant's Checksum.java, but yeah: your way is better Isn't this locale-sensitive? I think it should be explicit UTF-8 Also verbatim, but it actually caught my eye and I thought it was intentional to deal with line ending differences in diff OSes, but your comment makes me realize thta makes no sense – and BufferedReader chomps all possible endings equally regardless of locale/encoding. we can just name it DependencyCheckTask. I think in the long run we should do that, and rename the taskdef, and rename the macrodef, ... but for 3.6 it might be better to keep the changes to a minimum and just add the logic to the existing class and add the SHA1 files but not rename anything. ...updated patch with those changes ... going to start compiling the list of all SHA1s files on trunk
        Hide
        Steve Rowe added a comment -

        it can also detect ivy cache corrumption.

        I have encountered cache corrumption: AFAICT a truncated jar file (tika-parsers-1.0.jar); killing the cache fixed the problem. We could add a clean-ivy-cache target calling the target Ivy provides.

        Show
        Steve Rowe added a comment - it can also detect ivy cache corrumption. I have encountered cache corrumption: AFAICT a truncated jar file (tika-parsers-1.0.jar); killing the cache fixed the problem. We could add a clean-ivy-cache target calling the target Ivy provides .
        Hide
        Dawid Weiss added a comment -

        Btw. you can also avoid a recrawl by passing a refid of the same fileset to two tasks rather than constructing a new one in each. I don't mind renaming the class either.

        Show
        Dawid Weiss added a comment - Btw. you can also avoid a recrawl by passing a refid of the same fileset to two tasks rather than constructing a new one in each. I don't mind renaming the class either.
        Hide
        Dawid Weiss added a comment - - edited
        reader = new BufferedReader(new FileReader(f));
        

        Isn't this locale-sensitive? I think it should be explicit UTF-8 (or US-ASCII for that matter).

        +      String hexStr = Integer.toHexString(CHECKSUM_BYTE_MASK & digest[i]);
        +      if (hexStr.length() < 2) {
        +        checksum.append("0");
        +      }
        +      checksum.append(hexStr);
        

        Isn't any of these simpler?

        checksum.append(String.format(Locale.ENGLISH, "%02x", CHECKSUM_BYTE_MASK & digest[i]));
        

        or

        char [] HEX = "0123456789abcdef".toCharArray();
        int v = digest[i];
        checksum.append(HEX[(v >>> 4) & 0x0F]).append(HEX[v & 0x0F]);
        
        Show
        Dawid Weiss added a comment - - edited reader = new BufferedReader(new FileReader(f)); Isn't this locale-sensitive? I think it should be explicit UTF-8 (or US-ASCII for that matter). + String hexStr = Integer.toHexString(CHECKSUM_BYTE_MASK & digest[i]); + if (hexStr.length() < 2) { + checksum.append("0"); + } + checksum.append(hexStr); Isn't any of these simpler? checksum.append(String.format(Locale.ENGLISH, "%02x", CHECKSUM_BYTE_MASK & digest[i])); or char [] HEX = "0123456789abcdef".toCharArray(); int v = digest[i]; checksum.append(HEX[(v >>> 4) & 0x0F]).append(HEX[v & 0x0F]);
        Hide
        Robert Muir added a comment -

        If people want to pursue this, We can always refactor – either to generalize the name/description of of "LicenseCheckTask" or to refactor this out into it's own custom task (and do an extra jar crawl).

        I don't think we should do an extra crawl, we can just name it DependencyCheckTask.
        Dependencies need to have licensing information and sha checksums, and 'ant validate'
        fails if they don't.

        Show
        Robert Muir added a comment - If people want to pursue this, We can always refactor – either to generalize the name/description of of "LicenseCheckTask" or to refactor this out into it's own custom task (and do an extra jar crawl). I don't think we should do an extra crawl, we can just name it DependencyCheckTask. Dependencies need to have licensing information and sha checksums, and 'ant validate' fails if they don't.
        Hide
        Robert Muir added a comment -

        +1, we really need to do this. it can also detect ivy cache corrumption.

        its really unrelated to actually how we get the jars, thats an impl detail.
        we should be checking that its the jar we tested against, or fail hard.

        Show
        Robert Muir added a comment - +1, we really need to do this. it can also detect ivy cache corrumption. its really unrelated to actually how we get the jars, thats an impl detail. we should be checking that its the jar we tested against, or fail hard.
        Hide
        Hoss Man added a comment -

        Here's a strawman suggestion for adding a SHA1 check to our custom LicenseCheckTask, it was inspired by some of the code in ant's <checksum/> task, but it's not as configurable (only supports one SHA1 and only the simplest file format)

        I put this code in LicenseCheckTask because:

        • LicenseCheckTask already gets run when you use "ant validate" and that seems like the right place for it from a user perspective
        • ant's built in support for verifying checksums in the <checksum/> task will only set a property true/false if the checksum for a file doesn't match – even with -verbose, it won't tell you which file failed the checksum test, so it wasn't really suitable to adding to our "ant validate" target.
        • if i added a new ShaCheckTast we would have to configure it to run over all the same files LicenseCheckTask is already run over.

        If people want to pursue this, We can always refactor – either to generalize the name/description of of "LicenseCheckTask" or to refactor this out into it's own custom task (and do an extra jar crawl).

        (NOTE: current patch doesn't actually include the checksums themselves, so as is this will fail the build. If we trust what "ant resolve" is pulling down right now, the files would be trivial for anyone to generate, but i would suggest being a little more diligent and and generating the SHA1s against what was committed in svn up until last week – except obviously where we've actually changed which jar/version we use because of the ivy work)

        Comments?

        Show
        Hoss Man added a comment - Here's a strawman suggestion for adding a SHA1 check to our custom LicenseCheckTask, it was inspired by some of the code in ant's <checksum/> task, but it's not as configurable (only supports one SHA1 and only the simplest file format) I put this code in LicenseCheckTask because: LicenseCheckTask already gets run when you use "ant validate" and that seems like the right place for it from a user perspective ant's built in support for verifying checksums in the <checksum/> task will only set a property true/false if the checksum for a file doesn't match – even with -verbose, it won't tell you which file failed the checksum test, so it wasn't really suitable to adding to our "ant validate" target. if i added a new ShaCheckTast we would have to configure it to run over all the same files LicenseCheckTask is already run over. If people want to pursue this, We can always refactor – either to generalize the name/description of of "LicenseCheckTask" or to refactor this out into it's own custom task (and do an extra jar crawl). (NOTE: current patch doesn't actually include the checksums themselves, so as is this will fail the build. If we trust what "ant resolve" is pulling down right now, the files would be trivial for anyone to generate, but i would suggest being a little more diligent and and generating the SHA1s against what was committed in svn up until last week – except obviously where we've actually changed which jar/version we use because of the ivy work) Comments?
        Hide
        Hoss Man added a comment -

        #1: I know that Ivy attempts MD5 & SHA1 verification by default – but it does that verification against checksum files located on the server, so it only offers protection against corruption in transit, not against files deliberately modified on the server.

        #2 i realize that the maintainers of maven repos say "all files are immutable" and that this potential risk of malicious or accidental file changes exists for all maven users – but that's the choise of all maven users to accept that as a way of life. I'm raising this issue only to point out a discrepancy between the "confidence" we use to be able to give people who download src releases, vs what we have currently with ivy.

        Show
        Hoss Man added a comment - #1: I know that Ivy attempts MD5 & SHA1 verification by default – but it does that verification against checksum files located on the server, so it only offers protection against corruption in transit, not against files deliberately modified on the server. #2 i realize that the maintainers of maven repos say "all files are immutable" and that this potential risk of malicious or accidental file changes exists for all maven users – but that's the choise of all maven users to accept that as a way of life. I'm raising this issue only to point out a discrepancy between the "confidence" we use to be able to give people who download src releases, vs what we have currently with ivy.

          People

          • Assignee:
            Unassigned
            Reporter:
            Hoss Man
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development