Lucene - Core
  1. Lucene - Core
  2. LUCENE-4947

Java implementation (and improvement) of Levenshtein & associated lexicon automata

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2, 4.2.1
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I was encouraged by Mike McCandless to open an issue concerning this after I contacted him privately about it. Thanks Mike!

      I'd like to submit my Java implementation of the Levenshtein Automaton as a homogenous replacement for the current heterogenous, multi-component implementation in Lucene.

      Benefits of upgrading include

      • Reduced code complexity
      • Better performance from components that were previously implemented in Python
      • Support for on-the-fly dictionary-automaton manipulation (if you wish to use my dictionary-automaton implementation)

      The code for all the components is well structured, easy to follow, and extensively commented. It has also been fully tested for correct functionality and performance.

      The levenshtein automaton implementation (along with the required MDAG reference) can be found in my LevenshteinAutomaton Java library here: https://github.com/klawson88/LevenshteinAutomaton.

      The minimalistic directed acyclic graph (MDAG) which the automaton code uses to store and step through word sets can be found here: https://github.com/klawson88/MDAG

      *Transpositions aren't currently implemented. I hope the comment filled, editing-friendly code combined with the fact that the section in the Mihov paper detailing transpositions is only 2 pages makes adding the functionality trivial. Update introduces transposition inclusion in edit distance calculations!

      *As a result of support for on-the-fly manipulation, the MDAG (dictionary-automaton) creation process incurs a slight speed penalty. In order to have the best of both worlds, i'd recommend the addition of a constructor which only takes sorted input. The complete, easy to follow pseudo-code for the simple procedure can be found in the first article I linked under the references section in the MDAG repository)

      1. LevenshteinAutomaton-master.zip
        1.31 MB
        Kevin Lawson
      2. LevenshteinAutomaton-master-update.zip
        1.32 MB
        Kevin Lawson
      3. MDAG-master.zip
        1.94 MB
        Kevin Lawson

        Activity

        Hide
        Dawid Weiss added a comment -

        Cool but this may be a showstopper: "MDAG is licensed under the GNU General Public License (version 3)."

        Show
        Dawid Weiss added a comment - Cool but this may be a showstopper: "MDAG is licensed under the GNU General Public License (version 3)."
        Hide
        Christian Moen added a comment -

        Thanks a lot for wishing to submit code!

        It's not possible to include your code in Lucene if it has a GPL license. Quite frankly, I don't think even think Lucene committers can even have a look at it to consider it for inclusion with a GPL license.

        If you have written all the code or otherwise own all copyrights, would you mind switching to Apache License 2.0? That way, I at least think it would be possible to have a close look to see if this is a good fit for Lucene.

        Show
        Christian Moen added a comment - Thanks a lot for wishing to submit code! It's not possible to include your code in Lucene if it has a GPL license. Quite frankly, I don't think even think Lucene committers can even have a look at it to consider it for inclusion with a GPL license. If you have written all the code or otherwise own all copyrights, would you mind switching to Apache License 2.0? That way, I at least think it would be possible to have a close look to see if this is a good fit for Lucene.
        Hide
        Kevin Lawson added a comment -

        Ah. It slipped my mind that the result of combining GPL and Apache licensed code must be GPL.

        I'm fully committed providing it to the Lucene project under the Apache License (version 2) if possible, or if it comes down to it, changing the license entirely.

        If the former is possible, consider this post one that formally licenses both LevenshteinAutomaton and MDAG, to the Lucene development team under Apache License (version 2), irregardless of any licenses that may be included with either project, and any license notices contained in any of their constituent files.

        If the former is not possible, I suppose I could push up versions of each project licensed appropriately to the general public.

        Just tell me what action(s) need to be taken from here.

        Show
        Kevin Lawson added a comment - Ah. It slipped my mind that the result of combining GPL and Apache licensed code must be GPL. I'm fully committed providing it to the Lucene project under the Apache License (version 2) if possible, or if it comes down to it, changing the license entirely. If the former is possible, consider this post one that formally licenses both LevenshteinAutomaton and MDAG, to the Lucene development team under Apache License (version 2), irregardless of any licenses that may be included with either project, and any license notices contained in any of their constituent files. If the former is not possible, I suppose I could push up versions of each project licensed appropriately to the general public. Just tell me what action(s) need to be taken from here.
        Hide
        Simon Willnauer added a comment -

        I'm fully committed providing it to the Lucene project under the Apache License (version 2) if possible, or if it comes down to it, changing the license entirely.

        do you own all the copyrights for this GPL licensed product? I haven't looked into it yet but what we need to do here in a nutshell is:

        usually the PMC Chair helps here a lot but just FYI this is roughly what you need to go through.

        Show
        Simon Willnauer added a comment - I'm fully committed providing it to the Lucene project under the Apache License (version 2) if possible, or if it comes down to it, changing the license entirely. do you own all the copyrights for this GPL licensed product? I haven't looked into it yet but what we need to do here in a nutshell is: issue a code grant http://www.apache.org/licenses/software-grant.txt having received a http://www.apache.org/licenses/icla.txt which is good for future contributions... run through IP clearance http://incubator.apache.org/ip-clearance/index.html usually the PMC Chair helps here a lot but just FYI this is roughly what you need to go through.
        Hide
        Uwe Schindler added a comment -

        Hi,
        My comments:
        Can this tool be used to replace the Code-Generator that creates the large int[] arrays? If this is not the case, the actual implementation of the Levensthein Automaton might be smaller, but there 2 backsides:

        • Licene issues with the external library (seems to be solved already)
        • The far bigger problem is: Levensthein is not the only automaton used in Lucene Queries. Fuzzy, Wildcard and Regex are all based on AutomatonTermsEnum that uses the Bricks library for the automaton API/representation. If we bring in here this new MDAC stuff, the whole atomaton code inside needs to be ported over or we have code duplication (and Fuzzy does no longer use the Bricks Automaton lib).

        Alternatively, can the code be ported away from MDAC to Bricks-automaton, so it can interact with Lucene's Automaton library? If this is not the case, we can no longer easily combine wildcards/prefix/fuzzy anymore.

        Show
        Uwe Schindler added a comment - Hi, My comments: Can this tool be used to replace the Code-Generator that creates the large int[] arrays? If this is not the case, the actual implementation of the Levensthein Automaton might be smaller, but there 2 backsides: Licene issues with the external library (seems to be solved already) The far bigger problem is: Levensthein is not the only automaton used in Lucene Queries. Fuzzy, Wildcard and Regex are all based on AutomatonTermsEnum that uses the Bricks library for the automaton API/representation. If we bring in here this new MDAC stuff, the whole atomaton code inside needs to be ported over or we have code duplication (and Fuzzy does no longer use the Bricks Automaton lib). Alternatively, can the code be ported away from MDAC to Bricks-automaton, so it can interact with Lucene's Automaton library? If this is not the case, we can no longer easily combine wildcards/prefix/fuzzy anymore.
        Hide
        Kevin Lawson added a comment - - edited

        Alternatively, can the code be ported away from MDAC to Bricks-automaton, so it can interact with Lucene's Automaton library? If this is not the case, we can no longer easily combine wildcards/prefix/fuzzy anymore.

        Of course! If you take a look at the tableFuzzySearch() method here, you'll see that it takes an MDAG (which is equivalent in structure to the automatons implemented in Brics) and simply transitions it in step with the LevenshteinAutomaton. The method can be modified easily to accept a Brics automaton, which i'm assuming has methods that implement typical automaton actions (namely transitioning and accept state determination).

        The main reason one might want to consider using MDAG is that typically libraries that implement the data structure (which is more widely known as a DAWG) only support creation with sorted input (and thus, do not allow for modification). I believe Brics is no exception. My MDAG library supports unsorted input and run-time modification of the structure. (The minor drawback concerning this has been addressed in the original post).

        Show
        Kevin Lawson added a comment - - edited Alternatively, can the code be ported away from MDAC to Bricks-automaton, so it can interact with Lucene's Automaton library? If this is not the case, we can no longer easily combine wildcards/prefix/fuzzy anymore. Of course! If you take a look at the tableFuzzySearch() method here , you'll see that it takes an MDAG (which is equivalent in structure to the automatons implemented in Brics) and simply transitions it in step with the LevenshteinAutomaton. The method can be modified easily to accept a Brics automaton, which i'm assuming has methods that implement typical automaton actions (namely transitioning and accept state determination). The main reason one might want to consider using MDAG is that typically libraries that implement the data structure (which is more widely known as a DAWG) only support creation with sorted input (and thus, do not allow for modification). I believe Brics is no exception . My MDAG library supports unsorted input and run-time modification of the structure. (The minor drawback concerning this has been addressed in the original post).
        Hide
        Kevin Lawson added a comment - - edited

        do you own all the copyrights for this GPL licensed product?

        Yes. All the materials I'm submitting were created solely by me and not on behalf of any company.

        I haven't looked into it yet but what we need to do here in a nutshell is...usually the PMC Chair helps here a lot but just FYI this is roughly what you need to go through.

        Great. From the looks of it I'd have no problem submitting those documents. Should I wait for the PMC Chair to come in here? Or can I just submit the grant and license agreement to secretary@apache.org now?

        Show
        Kevin Lawson added a comment - - edited do you own all the copyrights for this GPL licensed product? Yes. All the materials I'm submitting were created solely by me and not on behalf of any company. I haven't looked into it yet but what we need to do here in a nutshell is...usually the PMC Chair helps here a lot but just FYI this is roughly what you need to go through. Great. From the looks of it I'd have no problem submitting those documents. Should I wait for the PMC Chair to come in here? Or can I just submit the grant and license agreement to secretary@apache.org now?
        Hide
        Steve Rowe added a comment -

        I haven't looked into it yet but what we need to do here in a nutshell is...usually the PMC Chair helps here a lot but just FYI this is roughly what you need to go through.

        Great. From the looks of it I'd have no problem submitting those documents. Should I wait for the PMC Chair to come in here? Or can I just submit the grant and license agreement to secretary@apache.org now?

        PMC Chair here - I've never shepherded one of these things before, so I need to get up to speed. I glanced through the links Simon sent (thanks Simon), nothing seems too difficult. I'll read more thoroughly and get back to you here.

        One potential issue that will need to be resolved first: from my past experience, the threshold at which code grants need to be invoked seems fuzzy to me: my previous takeaway had been that the quantity of the contribution, both in number of files and in line count, is a consideration: only a couple of files, or only a couple hundred lines of code, don't warrant a code grant. I looked at the git repo you pointed to, Kevin, and it seems to have more than a couple of files, and more than a couple hundred lines of code, so I'm pretty sure Simon's right, the code grant process will have to be invoked.

        Show
        Steve Rowe added a comment - I haven't looked into it yet but what we need to do here in a nutshell is...usually the PMC Chair helps here a lot but just FYI this is roughly what you need to go through. Great. From the looks of it I'd have no problem submitting those documents. Should I wait for the PMC Chair to come in here? Or can I just submit the grant and license agreement to secretary@apache.org now? PMC Chair here - I've never shepherded one of these things before, so I need to get up to speed. I glanced through the links Simon sent (thanks Simon), nothing seems too difficult. I'll read more thoroughly and get back to you here. One potential issue that will need to be resolved first: from my past experience, the threshold at which code grants need to be invoked seems fuzzy to me: my previous takeaway had been that the quantity of the contribution, both in number of files and in line count, is a consideration: only a couple of files, or only a couple hundred lines of code, don't warrant a code grant. I looked at the git repo you pointed to, Kevin, and it seems to have more than a couple of files, and more than a couple hundred lines of code, so I'm pretty sure Simon's right, the code grant process will have to be invoked.
        Hide
        Christian Moen added a comment -

        It sounds proper to do a code grant also because the software currently has a GPL license. Thanks for following up, Steve.

        Show
        Christian Moen added a comment - It sounds proper to do a code grant also because the software currently has a GPL license. Thanks for following up, Steve.
        Hide
        Steve Rowe added a comment -

        Kevin,

        You can license your code under multiple licenses if you like. The simplest thing to do here is to license your code with a single license: the Apache License v2.

        For info on the Apache License v2, see http://www.staging.apache.org/licenses/. See also http://www.apache.org/legal/resolved.html for a list of licenses which can and cannot be included as dependencies in Apache products. AFAICT, though, once you have signed and submitted the software grant and provided a tarball of the code, your grant is under the terms of the Apache License v2, and the license headers in files committed to the Lucene codebase will be modified to include a reference to the ALv2, and exclude any other license information.

        FYI, attribution for individual contributions is located in a single file: lucene/CHANGES.txt, e.g. for trunk: http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/CHANGES.txt

        I'll be filling out the XML version of this (HTML) IP clearance form: http://incubator.apache.org/ip-clearance/ip-clearance-template.html - I encourage you to take a look, Kevin.

        Steve

        Show
        Steve Rowe added a comment - Kevin, You can license your code under multiple licenses if you like. The simplest thing to do here is to license your code with a single license: the Apache License v2. For info on the Apache License v2, see http://www.staging.apache.org/licenses/ . See also http://www.apache.org/legal/resolved.html for a list of licenses which can and cannot be included as dependencies in Apache products. AFAICT, though, once you have signed and submitted the software grant and provided a tarball of the code, your grant is under the terms of the Apache License v2, and the license headers in files committed to the Lucene codebase will be modified to include a reference to the ALv2, and exclude any other license information. FYI, attribution for individual contributions is located in a single file: lucene/CHANGES.txt, e.g. for trunk: http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/CHANGES.txt I'll be filling out the XML version of this (HTML) IP clearance form: http://incubator.apache.org/ip-clearance/ip-clearance-template.html - I encourage you to take a look, Kevin. Steve
        Hide
        Kevin Lawson added a comment -

        Just updating the thread to notify everyone that I've just e-mailed the ICA and code grant documents (and their GPG-related files) to secretary@apache.org.

        Is there anything else that needs to be done on my part?

        Show
        Kevin Lawson added a comment - Just updating the thread to notify everyone that I've just e-mailed the ICA and code grant documents (and their GPG-related files) to secretary@apache.org. Is there anything else that needs to be done on my part?
        Hide
        Steve Rowe added a comment -

        Is there anything else that needs to be done on my part?

        Hi Kevin Lawson,

        What did you send to secretary@apache.org? We need to know exactly what it is you're donating, so that we can start the vetting and header modification process (assuming you have not yet done any header modification yourself).

        This is generally done by transferring a compressed tarball to a public place, to provide access to those who wants to inspect it, and those who will do the header modification. But maybe this could be done via Git hash(es)? I looked at a bunch of the existing examples of this process <http://incubator.apache.org/ip-clearance/>, and all of them go the compressed tarball route.

        Steve

        Show
        Steve Rowe added a comment - Is there anything else that needs to be done on my part? Hi Kevin Lawson , What did you send to secretary@apache.org? We need to know exactly what it is you're donating, so that we can start the vetting and header modification process (assuming you have not yet done any header modification yourself). This is generally done by transferring a compressed tarball to a public place, to provide access to those who wants to inspect it, and those who will do the header modification. But maybe this could be done via Git hash(es)? I looked at a bunch of the existing examples of this process < http://incubator.apache.org/ip-clearance/ >, and all of them go the compressed tarball route. Steve
        Hide
        Dawid Weiss added a comment -

        Would it be possible to change the licensing directly on github and then make an import from a git revision? It's traceable after all (with parent etc) and it would make the process simpler I think .

        Show
        Dawid Weiss added a comment - Would it be possible to change the licensing directly on github and then make an import from a git revision? It's traceable after all (with parent etc) and it would make the process simpler I think .
        Hide
        Steve Rowe added a comment -

        Would it be possible to change the licensing directly on github and then make an import from a git revision? It's traceable after all (with parent etc) and it would make the process simpler I think .

        I think it's possible.

        Kevin in his paperwork will have had to have referred to the git revision, but he wrote he sent GPG-related files, which doesn't seem compatible to me, since detached signatures will AFAIK have been generated against a local copy, not the Git repository, and so may not be reproducible by third parties (e.g. are Git checkouts bit-for-bit identical on Linux vs. Windows?).

        I think the intent of the signature is just integrity/identity: a way to refer to the exact bits being donated. Git hash(es) should serve the same function.

        Show
        Steve Rowe added a comment - Would it be possible to change the licensing directly on github and then make an import from a git revision? It's traceable after all (with parent etc) and it would make the process simpler I think . I think it's possible. Kevin in his paperwork will have had to have referred to the git revision, but he wrote he sent GPG-related files, which doesn't seem compatible to me, since detached signatures will AFAIK have been generated against a local copy, not the Git repository, and so may not be reproducible by third parties (e.g. are Git checkouts bit-for-bit identical on Linux vs. Windows?). I think the intent of the signature is just integrity/identity: a way to refer to the exact bits being donated. Git hash(es) should serve the same function.
        Hide
        Dawid Weiss added a comment -

        Git checkouts may not be bit-for-bit identical (they may vary if you have custom EOL treatment in .gitattributes, for example) but a git revision hash uniquely identifies a revision in a repository (and this, to my understanding, cannot be easily forged since it includes all parent revisions). So if you git clone/export a given hash then it's essentially the same as if somebody sent you a tarball with file checksums?

        I don't know anything about the legal aspects (if it makes a difference whether you're pulling from an ASL-licensed copy compared to the author taking the initiative).

        Show
        Dawid Weiss added a comment - Git checkouts may not be bit-for-bit identical (they may vary if you have custom EOL treatment in .gitattributes, for example) but a git revision hash uniquely identifies a revision in a repository (and this, to my understanding, cannot be easily forged since it includes all parent revisions). So if you git clone/export a given hash then it's essentially the same as if somebody sent you a tarball with file checksums? I don't know anything about the legal aspects (if it makes a difference whether you're pulling from an ASL-licensed copy compared to the author taking the initiative).
        Hide
        Steve Rowe added a comment -

        Git checkouts may not be bit-for-bit identical

        Right, I mentioned this because Kevin wrote that he had sent "GPG-related files" to secretary@a.o, by which I assume he means detached signature(s), and if a checkout is not bit-for-bit identical, we will have to ignore those signatures, since they won't be reproducible, and that may have procedural implications: can't move beyond the step where you have to verify the signature.

        Of course, if the Git hash(es) are sufficient (and I agree with you, Dawid, that they seem to be), then it should be fine to ignore the signature(s) Kevin sent.

        Show
        Steve Rowe added a comment - Git checkouts may not be bit-for-bit identical Right, I mentioned this because Kevin wrote that he had sent "GPG-related files" to secretary@a.o, by which I assume he means detached signature(s), and if a checkout is not bit-for-bit identical, we will have to ignore those signatures, since they won't be reproducible, and that may have procedural implications: can't move beyond the step where you have to verify the signature. Of course, if the Git hash(es) are sufficient (and I agree with you, Dawid, that they seem to be), then it should be fine to ignore the signature(s) Kevin sent.
        Hide
        Kevin Lawson added a comment -

        Sorry for the ambiguity.

        When I wrote "and GP-related files", I meant the signatures for the ICA and software grant documents. The documents made no mention of signing the tarballs to be donated (the ip-clearance form only mentions it conditionally in step 5).

        I assumed you guys would simply grab the tarballs from the GitHub links I posted.

        Show
        Kevin Lawson added a comment - Sorry for the ambiguity. When I wrote "and GP-related files", I meant the signatures for the ICA and software grant documents. The documents made no mention of signing the tarballs to be donated (the ip-clearance form only mentions it conditionally in step 5). I assumed you guys would simply grab the tarballs from the GitHub links I posted.
        Hide
        Steve Rowe added a comment -

        I assumed you guys would simply grab the tarballs from the GitHub links I posted.

        Okay, cool, I think that will work.

        Do you intend for these two projects to live on after they've been incorporated into Lucene? If so, then I'll fork them on Github and start making license header changes -> ALv2; Kevin, do you give your consent for me to change the license headers in all files to point to ALv2?

        If you don't intend for these two projects to continue life separately from Lucene, then I think it will make sense for you to do the license changes in-place yourself, Kevin. Alternatively, you could grant write access to someone else to do the work. Please let us know.

        I have started the IP clearance form. It's online now at http://incubator.apache.org/ip-clearance/lucene-levenshtein-automaton-mdag.html.

        Show
        Steve Rowe added a comment - I assumed you guys would simply grab the tarballs from the GitHub links I posted. Okay, cool, I think that will work. Do you intend for these two projects to live on after they've been incorporated into Lucene? If so, then I'll fork them on Github and start making license header changes -> ALv2; Kevin, do you give your consent for me to change the license headers in all files to point to ALv2? If you don't intend for these two projects to continue life separately from Lucene, then I think it will make sense for you to do the license changes in-place yourself, Kevin. Alternatively, you could grant write access to someone else to do the work. Please let us know. I have started the IP clearance form. It's online now at http://incubator.apache.org/ip-clearance/lucene-levenshtein-automaton-mdag.html .
        Hide
        Christian Moen added a comment -

        Kevin,

        I think it's best that you do the license change yourself and that we don't have any active role in making the change since you are the only person entitled to make the change.

        This change can be done by using the below header on all the source code and other relevant text files:

        /*
         * Licensed to the Apache Software Foundation (ASF) under one or more
         * contributor license agreements.  See the NOTICE file distributed with
         * this work for additional information regarding copyright ownership.
         * The ASF licenses this file to You under the Apache License, Version 2.0
         * (the "License"); you may not use this file except in compliance with
         * the License.  You may obtain a copy of the License at
         *
         *     http://www.apache.org/licenses/LICENSE-2.0
         *
         * Unless required by applicable law or agreed to in writing, software
         * distributed under the License is distributed on an "AS IS" BASIS,
         * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
         * See the License for the specific language governing permissions and
         * limitations under the License.
         */
        

        After this has been done, please make a tarball and attach it to this JIRA and indicate that this is the code you wish to grant and also inform us about the MD5 hash of the tarball. (This will go into the IP-clearance document and will be used to identify the codebase.)

        It's a good idea to also use this MD5 hash as part of Exhibit A in the software-grant.txt agreement unless you have signed and submitted this already. (If you donate the code yourself by attaching it to the JIRA as described above, I believe the hashes not being part of Exhibit A is acceptable.)

        Please feel free to add your comments, Steve.

        Show
        Christian Moen added a comment - Kevin, I think it's best that you do the license change yourself and that we don't have any active role in making the change since you are the only person entitled to make the change. This change can be done by using the below header on all the source code and other relevant text files: /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ After this has been done, please make a tarball and attach it to this JIRA and indicate that this is the code you wish to grant and also inform us about the MD5 hash of the tarball. (This will go into the IP-clearance document and will be used to identify the codebase.) It's a good idea to also use this MD5 hash as part of Exhibit A in the software-grant.txt agreement unless you have signed and submitted this already. (If you donate the code yourself by attaching it to the JIRA as described above, I believe the hashes not being part of Exhibit A is acceptable.) Please feel free to add your comments, Steve.
        Hide
        Steve Rowe added a comment -

        I think it's best that you do the license change yourself and that we don't have any active role in making the change since you are the only person entitled to make the change.

        +1

        After this has been done, please make a tarball and attach it to this JIRA and indicate that this is the code you wish to grant and also inform us about the MD5 hash of the tarball. (This will go into the IP-clearance document and will be used to identify the codebase.)

        I and Dawid had been advocating using Github for this, but I agree with Christian: a tarball attached to this issue by you, Kevin Lawson, will remove all ambiguity about what is being donated and by whom. Also, Github is not under ASF control, and in the future if that business goes under, the ASF will lose the history of this donation.

        Show
        Steve Rowe added a comment - I think it's best that you do the license change yourself and that we don't have any active role in making the change since you are the only person entitled to make the change. +1 After this has been done, please make a tarball and attach it to this JIRA and indicate that this is the code you wish to grant and also inform us about the MD5 hash of the tarball. (This will go into the IP-clearance document and will be used to identify the codebase.) I and Dawid had been advocating using Github for this, but I agree with Christian: a tarball attached to this issue by you, Kevin Lawson , will remove all ambiguity about what is being donated and by whom. Also, Github is not under ASF control, and in the future if that business goes under, the ASF will lose the history of this donation.
        Hide
        Kevin Lawson added a comment - - edited

        Hi all.

        I've changed the license to the Apache License 2.0 and have modified all of the license notices. The checksums for the archives (which have been attached to this issue) are below:

        LevenshteinAutomaton-master.zip MD5 checksum: 081b417edbd7d2a562085e1c0dfb0a4c

        MDAG-master.zip MD5 checksum: 109e99dca700e02d1ad54306688472a5

        Show
        Kevin Lawson added a comment - - edited Hi all. I've changed the license to the Apache License 2.0 and have modified all of the license notices. The checksums for the archives (which have been attached to this issue) are below: LevenshteinAutomaton-master.zip MD5 checksum: 081b417edbd7d2a562085e1c0dfb0a4c MDAG-master.zip MD5 checksum: 109e99dca700e02d1ad54306688472a5
        Hide
        Steve Rowe added a comment -

        Just updating the thread to notify everyone that I've just e-mailed the ICA and code grant documents (and their GPG-related files) to secretary@apache.org.

        I monitor commits to the ICLA and code grants record files, and neither the ICLA nor the code grant document has been recorded yet. I'll post on this issue once the code grant has been recorded.

        Kevin Lawson, did you send the code grant to legal-archive@apache.org in addition to sending it to secretary@apache.org? This is mentioned as a requirement in step 3 of the process section in http://incubator.apache.org/ip-clearance/ip-clearance-template.html.

        Show
        Steve Rowe added a comment - Just updating the thread to notify everyone that I've just e-mailed the ICA and code grant documents (and their GPG-related files) to secretary@apache.org. I monitor commits to the ICLA and code grants record files, and neither the ICLA nor the code grant document has been recorded yet. I'll post on this issue once the code grant has been recorded. Kevin Lawson , did you send the code grant to legal-archive@apache.org in addition to sending it to secretary@apache.org? This is mentioned as a requirement in step 3 of the process section in http://incubator.apache.org/ip-clearance/ip-clearance-template.html .
        Hide
        Kevin Lawson added a comment -

        Kevin Lawson, did you send the code grant to legal-archive@apache.org in addition to sending it to secretary@apache.org? This is mentioned as a requirement in step 3 of the process section in http://incubator.apache.org/ip-clearance/ip-clearance-template.html.

        Ah, I definitely overlooked that. Sent! Was done a couple of hours ago, but I figured I'd notify you here.

        Show
        Kevin Lawson added a comment - Kevin Lawson, did you send the code grant to legal-archive@apache.org in addition to sending it to secretary@apache.org? This is mentioned as a requirement in step 3 of the process section in http://incubator.apache.org/ip-clearance/ip-clearance-template.html . Ah, I definitely overlooked that. Sent! Was done a couple of hours ago, but I figured I'd notify you here.
        Hide
        Steve Rowe added a comment -

        Ah, I definitely overlooked that. Sent! Was done a couple of hours ago, but I figured I'd notify you here.

        Kevin Lawson, I'm concerned that the Lucene PMC has not yet received notification from the Apache Secretary of either your code grant or your ICLA. We got notification earlier today for SooMyung Lee's paperwork (see LUCENE-4956), which was sent to secretary@apache.org after yours.

        Is it possible that you sent the paperwork to the wrong address, or that there was some other mail snafu?

        Show
        Steve Rowe added a comment - Ah, I definitely overlooked that. Sent! Was done a couple of hours ago, but I figured I'd notify you here. Kevin Lawson , I'm concerned that the Lucene PMC has not yet received notification from the Apache Secretary of either your code grant or your ICLA. We got notification earlier today for SooMyung Lee's paperwork (see LUCENE-4956 ), which was sent to secretary@apache.org after yours. Is it possible that you sent the paperwork to the wrong address, or that there was some other mail snafu?
        Hide
        Kevin Lawson added a comment -

        The address I sent it to is indeed correct (secretary@apache.org). Maybe it's something on their end (spam filter)? I've just re-sent it.

        Show
        Kevin Lawson added a comment - The address I sent it to is indeed correct (secretary@apache.org). Maybe it's something on their end (spam filter)? I've just re-sent it.
        Hide
        Steve Rowe added a comment -

        The Lucene PMC received notification today from the Apache Secretary that Kevin's code grant and ICLA paperwork have been received and recorded.

        Now that we have Kevin's code grant and ICLA, we can start verifying headers/licensing. Since it's not clear where this software will go, I don't think it makes sense to create a branch yet. People doing the header/license verification/modification can just attach modified tarballs to this issue if necessary.

        Show
        Steve Rowe added a comment - The Lucene PMC received notification today from the Apache Secretary that Kevin's code grant and ICLA paperwork have been received and recorded. Now that we have Kevin's code grant and ICLA, we can start verifying headers/licensing. Since it's not clear where this software will go, I don't think it makes sense to create a branch yet. People doing the header/license verification/modification can just attach modified tarballs to this issue if necessary.
        Hide
        Kevin Lawson added a comment -

        I had some free time yesterday and updated LevenshteinAutomaton to include transpositions in its edit distance determination methods. The included tests have also been updated to accommodate the modifications and ensure correct functionality.

        How do I go about submitting the updated code? Push an update to my github? Simply attach the new archive here? I'm not sure where the code donation/acceptance process is at now, so I'm unsure of how to do this.

        Show
        Kevin Lawson added a comment - I had some free time yesterday and updated LevenshteinAutomaton to include transpositions in its edit distance determination methods. The included tests have also been updated to accommodate the modifications and ensure correct functionality. How do I go about submitting the updated code? Push an update to my github? Simply attach the new archive here? I'm not sure where the code donation/acceptance process is at now, so I'm unsure of how to do this.
        Hide
        Kevin Lawson added a comment - - edited

        I've attached an archive of the most recent version of LevenshteinAutomaton, which includes transpositions in the edit distance calculations.

        LevenshteinAutomaton-master-update.zip MD5 checksum: 8187978b438435ac2d23981bb0133d34

        Show
        Kevin Lawson added a comment - - edited I've attached an archive of the most recent version of LevenshteinAutomaton, which includes transpositions in the edit distance calculations. LevenshteinAutomaton-master-update.zip MD5 checksum: 8187978b438435ac2d23981bb0133d34

          People

          • Assignee:
            Unassigned
            Reporter:
            Kevin Lawson
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:

              Development