CouchDB
  1. CouchDB
  2. COUCHDB-625

Pure Erlang alternative to crypto library

    Details

    • Type: Improvement Improvement
    • Status: Reopened
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Infrastructure
    • Labels:
      None
    • Skill Level:
      Committers Level (Medium to Hard)

      Description

      On some platforms (in my case a SheevaPlug running on armv5te) it may be difficult or impossible to obtain a version of Erlang built with support for the crypto standard library. I grepped the CouchDB source and have attempted to reproduce the used crypto calls in pure Erlang.

      I have reproduced the start/0, rand_uniform/2, rand_bytes/1, sha/1, and sha_mac/2 functions, along with test_sha/1 and test_sha_mac/1 functions to validate the pure Erlang results against the crypto library's results. The public non-test functions attempt to first call into crypto if available, as it is the preferred implementation.

      As I'm not familiar with the build system, app system, etc. of Erlang I am only attaching the library implementation. I'm sure more work would be required to fully integrate it into CouchDB if accepted.

      As far as licensing goes, SHA1 is defined in NIST FIPS 180-2 (http://csrc.nist.gov/publications/fips/fips180-2/fips180-2.pdf), and according to the IETF, the patent covering the algorithm has been made royalty-free (https://datatracker.ietf.org/ipr/858).

      1. ccrypto.erl
        6 kB
        Jonathan D. Knezek
      2. ccrypto.erl
        5 kB
        Jonathan D. Knezek

        Activity

        Hide
        Chris Anderson added a comment -

        On Wed, Jan 20, 2010 at 7:16 PM, Brian McCallister <brianm@skife.org> wrote:
        > On copyright front, pretty sure you are all clear – could be
        > copyright problem to cut and paste, not to learn from and use to
        > implement. My bigger concern is about the the crypto impl – avoiding
        > things like timing attacks on algorithms is very tricky, and unless
        > you really grok them, you probably have side channel weaknesses in the
        > impl

        Yes that's why I prefer to have a native erlang fallback instead of a
        lightweight c implementation. At least it will be clear for
        performance reasons that you shouldn't be running on less than a full
        OpenSSL-based library.

        I'd I think it's worth the cost of widespread adoption on embedded
        devices. Maybe it'd be smart to make a compile time dependency
        warning, so people know what they are getting into.

        Devs – what about this compile time warning if we drop back to native crypto? This might be more solid from a no-surprises standpoint than the current patch which switches at runtime.

        Show
        Chris Anderson added a comment - On Wed, Jan 20, 2010 at 7:16 PM, Brian McCallister <brianm@skife.org> wrote: > On copyright front, pretty sure you are all clear – could be > copyright problem to cut and paste, not to learn from and use to > implement. My bigger concern is about the the crypto impl – avoiding > things like timing attacks on algorithms is very tricky, and unless > you really grok them, you probably have side channel weaknesses in the > impl Yes that's why I prefer to have a native erlang fallback instead of a lightweight c implementation. At least it will be clear for performance reasons that you shouldn't be running on less than a full OpenSSL-based library. I'd I think it's worth the cost of widespread adoption on embedded devices. Maybe it'd be smart to make a compile time dependency warning, so people know what they are getting into. Devs – what about this compile time warning if we drop back to native crypto? This might be more solid from a no-surprises standpoint than the current patch which switches at runtime.
        Hide
        Chris Anderson added a comment -

        It looks like the consensus is that it's OK to use this code.

        http://mail-archives.apache.org/mod_mbox/www-legal-discuss/201001.mbox/%3c3d4032301001200405i3c5dde7fve4a13247e6b061ad@mail.gmail.com%3e

        With that out of the way, we should see if there are any technical concerns before we start importing this into the codebase.

        I think the module is small enough to avoid needing an explicit code grant: http://www.apache.org/licenses/software-grant.txt

        Question for other committers: should we import this as couch_crypto.erl or should it treated as it's own library and get a directory inside src?

        Show
        Chris Anderson added a comment - It looks like the consensus is that it's OK to use this code. http://mail-archives.apache.org/mod_mbox/www-legal-discuss/201001.mbox/%3c3d4032301001200405i3c5dde7fve4a13247e6b061ad@mail.gmail.com%3e With that out of the way, we should see if there are any technical concerns before we start importing this into the codebase. I think the module is small enough to avoid needing an explicit code grant: http://www.apache.org/licenses/software-grant.txt Question for other committers: should we import this as couch_crypto.erl or should it treated as it's own library and get a directory inside src?
        Hide
        Chris Anderson added a comment -

        I've sent a request to legal-discuss. Hopefully they've seen something like this before.

        Show
        Chris Anderson added a comment - I've sent a request to legal-discuss. Hopefully they've seen something like this before.
        Hide
        Jonathan D. Knezek added a comment -

        Right, I've never had one reported.

        Show
        Jonathan D. Knezek added a comment - Right, I've never had one reported.
        Hide
        Paul Joseph Davis added a comment -

        When you say reports the number of collisions that's always zero right? If not then I'd be a bit worried.

        Show
        Paul Joseph Davis added a comment - When you say reports the number of collisions that's always zero right? If not then I'd be a bit worried.
        Hide
        Jonathan D. Knezek added a comment -

        This new version exports the pure functions (prefixed with pure_) for testing, adds test_rand_bytes/1 which reports the number of collisions found in N generated UUIDs, and combines the word expansion and mixing steps for sha/1.

        The SHA1 implementation is about 15% faster than the previous version due to the elimination of list construction.

        Also, after actually checking the sites where sha/1 and sha_mac/2 were called, I'm much less concerned about speed. I thought they were being used to calculate the revisions, so I was worried about possibly handling several MB. All the uses are dealing with authentication, though, and this version can hash about 21kB/s on my 1.2 GHz SheevaPlug.

        I say give legal a ring. :]

        Show
        Jonathan D. Knezek added a comment - This new version exports the pure functions (prefixed with pure_) for testing, adds test_rand_bytes/1 which reports the number of collisions found in N generated UUIDs, and combines the word expansion and mixing steps for sha/1. The SHA1 implementation is about 15% faster than the previous version due to the elimination of list construction. Also, after actually checking the sites where sha/1 and sha_mac/2 were called, I'm much less concerned about speed. I thought they were being used to calculate the revisions, so I was worried about possibly handling several MB. All the uses are dealing with authentication, though, and this version can hash about 21kB/s on my 1.2 GHz SheevaPlug. I say give legal a ring. :]
        Hide
        Chris Anderson added a comment -

        Thanks for the links. If you are interested in sharing the new version, I'm interested in pursing the licensing questions.

        I'm not sure about the performance. I know it would be slower, but so much slower that it would matter for a single user running an app on an embedded device?

        The idea of making a C-based port version of it seems redundant to me: isn't that already basically what the stdlib crypto app does?

        The appeal of pure-Erlang crypto is that we can fall back on it in places where it's hard to build native code.

        Do others think this is worth pursing or are we just setting a bunch of GameBoy users up for UUID collisions in the year 2050?

        Show
        Chris Anderson added a comment - Thanks for the links. If you are interested in sharing the new version, I'm interested in pursing the licensing questions. I'm not sure about the performance. I know it would be slower, but so much slower that it would matter for a single user running an app on an embedded device? The idea of making a C-based port version of it seems redundant to me: isn't that already basically what the stdlib crypto app does? The appeal of pure-Erlang crypto is that we can fall back on it in places where it's hard to build native code. Do others think this is worth pursing or are we just setting a bunch of GameBoy users up for UUID collisions in the year 2050?
        Hide
        Jonathan D. Knezek added a comment -

        Sure.

        The SHA1 pseudo-code came from http://en.wikipedia.org/w/index.php?title=SHA_hash_functions&oldid=338273346#SHA-1_pseudocode. It was hand-typed as I attempted to convert from imperative to functional code. This is one of my first attempts at serious Erlang and functional programming in general so it was originally intended as an intellectual challenge.

        The currently posted version is very similar structurally. My latest (unposted) version has ~15% improved performance and is very different structurally (combined the extend and loop sections to avoid list construction). I can post this latest version if it'll help.

        The HMAC pseudo-code came from http://en.wikipedia.org/w/index.php?title=HMAC&oldid=336183553#Implementation which is actually incorrect. I corrected my implementation from the RFC description at http://tools.ietf.org/html/rfc2104. The code is very similar, but this is an algorithm I would argue can only really be implemented one way... It's only ten lines of pseudo-code.

        I unfortunately think it's a moot point because the pure Erlang performance is too poor to be useful and will likely have to be implemented as a native extension, external port, or something like that to be feasible...

        Show
        Jonathan D. Knezek added a comment - Sure. The SHA1 pseudo-code came from http://en.wikipedia.org/w/index.php?title=SHA_hash_functions&oldid=338273346#SHA-1_pseudocode . It was hand-typed as I attempted to convert from imperative to functional code. This is one of my first attempts at serious Erlang and functional programming in general so it was originally intended as an intellectual challenge. The currently posted version is very similar structurally. My latest (unposted) version has ~15% improved performance and is very different structurally (combined the extend and loop sections to avoid list construction). I can post this latest version if it'll help. The HMAC pseudo-code came from http://en.wikipedia.org/w/index.php?title=HMAC&oldid=336183553#Implementation which is actually incorrect. I corrected my implementation from the RFC description at http://tools.ietf.org/html/rfc2104 . The code is very similar, but this is an algorithm I would argue can only really be implemented one way... It's only ten lines of pseudo-code. I unfortunately think it's a moot point because the pure Erlang performance is too poor to be useful and will likely have to be implemented as a native extension, external port, or something like that to be feasible...
        Hide
        Chris Anderson added a comment -

        Jonathan, do you have a Wikipedia URL for the pseudocode?

        I'm guessing that a big part of this hinges on 2 questions:

        did you copy/paste or type from scratch?

        how different is the code.

        I'm working on an email to legal-discuss, but we should have the Wikipedia URLs before sending.

        Thanks!

        Show
        Chris Anderson added a comment - Jonathan, do you have a Wikipedia URL for the pseudocode? I'm guessing that a big part of this hinges on 2 questions: did you copy/paste or type from scratch? how different is the code. I'm working on an email to legal-discuss, but we should have the Wikipedia URLs before sending. Thanks!
        Hide
        Chris Anderson added a comment -

        Reopening, as this is a good idea.

        I think it's worth inquiring from the legal list about algorithms implemented based on wikipedia articles.

        My gut says we can use it. If we can't, it's worth redoing the work based on the original govt papers, or perhaps APR because we know it is clean.

        Show
        Chris Anderson added a comment - Reopening, as this is a good idea. I think it's worth inquiring from the legal list about algorithms implemented based on wikipedia articles. My gut says we can use it. If we can't, it's worth redoing the work based on the original govt papers, or perhaps APR because we know it is clean.
        Hide
        Jonathan D. Knezek added a comment -

        Though http://creativecommons.org/licenses/by-sa/3.0/ states that derivative works of SA may be distributed "under the same, similar or a compatible license" (which would lead me to believe there's a chance the Apache license could be compatible), http://creativecommons.org/about/licenses/ declares that one must "license their new creations under the identical terms."

        Sorry for the trouble. :[

        Show
        Jonathan D. Knezek added a comment - Though http://creativecommons.org/licenses/by-sa/3.0/ states that derivative works of SA may be distributed "under the same, similar or a compatible license" (which would lead me to believe there's a chance the Apache license could be compatible), http://creativecommons.org/about/licenses/ declares that one must "license their new creations under the identical terms." Sorry for the trouble. :[
        Hide
        Jonathan D. Knezek added a comment -

        Added notes about cryptographic insecurity of the rand_* functions and corrected licensing information thanks to clarification by Dw on the mailing list. The SHA-related functions are a derivative work under the CC-BY-SA license, so I'm not sure about their eligibility for inclusion under the Apache license.

        This may be a moot point however, as although the speed on small data sets is acceptable, on larger data sets it becomes a hindrance. For example, hashing 1MB of data on my Athlon64 X2 5000+ takes ~3.3s.

        Show
        Jonathan D. Knezek added a comment - Added notes about cryptographic insecurity of the rand_* functions and corrected licensing information thanks to clarification by Dw on the mailing list. The SHA-related functions are a derivative work under the CC-BY-SA license, so I'm not sure about their eligibility for inclusion under the Apache license. This may be a moot point however, as although the speed on small data sets is acceptable, on larger data sets it becomes a hindrance. For example, hashing 1MB of data on my Athlon64 X2 5000+ takes ~3.3s.

          People

          • Assignee:
            Chris Anderson
            Reporter:
            Jonathan D. Knezek
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development