Index: modules/analysis/README.txt =================================================================== --- modules/analysis/README.txt (revision 1242237) +++ modules/analysis/README.txt (working copy) @@ -41,6 +41,10 @@ An add-on analysis library that contains a universal algorithmic stemmer, including tables for the Polish language. +lucene-analyzers-uima-XX.jar + An add-on analysis library that contains tokenizers/analyzers using + Apache UIMA extracted annotations to identify tokens/types/etc. + common/src/java icu/src/java kuromoji/src/java @@ -48,6 +52,7 @@ phonetic/src/java smartcn/src/java stempel/src/java +uima/src/java The source code for the libraries. common/src/test @@ -57,4 +62,5 @@ phonetic/src/test smartcn/src/test stempel/src/test +uima/src/test Unit tests for the libraries. Property changes on: modules/analysis/uima ___________________________________________________________________ Added: svn:ignore + *.iml Index: modules/analysis/uima/lib/uima-an-wst-NOTICE.txt =================================================================== --- modules/analysis/uima/lib/uima-an-wst-NOTICE.txt (revision 0) +++ modules/analysis/uima/lib/uima-an-wst-NOTICE.txt (revision 0) @@ -0,0 +1,7 @@ + +UIMA Annotator: WhitespaceTokenizer +Copyright 2006-2010 The Apache Software Foundation + +This product includes software developed at +The Apache Software Foundation (http://www.apache.org/). + Index: modules/analysis/uima/lib/uimaj-core-2.3.1.jar =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Property changes on: modules/analysis/uima/lib/uimaj-core-2.3.1.jar ___________________________________________________________________ Added: svn:mime-type + application/octet-stream Index: modules/analysis/uima/lib/uimaj-core-LICENSE-ASL.txt =================================================================== --- modules/analysis/uima/lib/uimaj-core-LICENSE-ASL.txt (revision 0) +++ modules/analysis/uima/lib/uimaj-core-LICENSE-ASL.txt (revision 0) @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. Index: modules/analysis/uima/lib/uima-an-tagger-2.3.1.jar =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Property changes on: modules/analysis/uima/lib/uima-an-tagger-2.3.1.jar ___________________________________________________________________ Added: svn:mime-type + application/octet-stream Index: modules/analysis/uima/lib/uimaj-core-NOTICE.txt =================================================================== --- modules/analysis/uima/lib/uimaj-core-NOTICE.txt (revision 0) +++ modules/analysis/uima/lib/uimaj-core-NOTICE.txt (revision 0) @@ -0,0 +1,13 @@ + +UIMA Base: uimaj-core +Copyright 2006-2010 The Apache Software Foundation + +This product includes software developed at +The Apache Software Foundation (http://www.apache.org/). + +Portions of Apache UIMA were originally developed by +International Business Machines Corporation and are +licensed to the Apache Software Foundation under the +"Software Grant License Agreement", informally known as the +"IBM UIMA License Agreement". +Copyright (c) 2003, 2006 IBM Corporation. Index: modules/analysis/uima/lib/uima-an-tagger-LICENSE-ASL.txt =================================================================== --- modules/analysis/uima/lib/uima-an-tagger-LICENSE-ASL.txt (revision 0) +++ modules/analysis/uima/lib/uima-an-tagger-LICENSE-ASL.txt (revision 0) @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. Index: modules/analysis/uima/lib/uima-an-tagger-NOTICE.txt =================================================================== --- modules/analysis/uima/lib/uima-an-tagger-NOTICE.txt (revision 0) +++ modules/analysis/uima/lib/uima-an-tagger-NOTICE.txt (revision 0) @@ -0,0 +1,7 @@ + +UIMA Annotator: Tagger +Copyright 2006-2010 The Apache Software Foundation + +This product includes software developed at +The Apache Software Foundation (http://www.apache.org/). + Index: modules/analysis/uima/lib/uima-an-wst-2.3.1.jar =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Property changes on: modules/analysis/uima/lib/uima-an-wst-2.3.1.jar ___________________________________________________________________ Added: svn:mime-type + application/octet-stream Index: modules/analysis/uima/lib/uima-an-wst-LICENSE-ASL.txt =================================================================== --- modules/analysis/uima/lib/uima-an-wst-LICENSE-ASL.txt (revision 0) +++ modules/analysis/uima/lib/uima-an-wst-LICENSE-ASL.txt (revision 0) @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. Index: modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/UIMATypeAwareAnalyzerTest.java =================================================================== --- modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/UIMATypeAwareAnalyzerTest.java (revision 0) +++ modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/UIMATypeAwareAnalyzerTest.java (revision 0) @@ -0,0 +1,66 @@ +package org.apache.lucene.analysis.uima; + +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import org.apache.lucene.analysis.BaseTokenStreamTestCase; +import org.apache.lucene.analysis.TokenStream; +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import java.io.StringReader; + +/** + * Testcase for {@link UIMATypeAwareAnalyzer} + */ +public class UIMATypeAwareAnalyzerTest extends BaseTokenStreamTestCase { + + private UIMATypeAwareAnalyzer analyzer; + + @Before + public void setUp() throws Exception { + super.setUp(); + analyzer = new UIMATypeAwareAnalyzer("/uima/AggregateSentenceAE.xml", + "org.apache.uima.TokenAnnotation", "posTag"); + } + + @After + public void tearDown() throws Exception { + analyzer.close(); + super.tearDown(); + } + + @Test + public void baseUIMATypeAwareAnalyzerStreamTest() throws Exception { + + // create a token stream + TokenStream ts = analyzer.tokenStream("text", new StringReader("the big brown fox jumped on the wood")); + + // check that 'the big brown fox jumped on the wood' tokens have the expected PoS types + assertTokenStreamContents(ts, + new String[]{"the", "big", "brown", "fox", "jumped", "on", "the", "wood"}, + new String[]{"at", "jj", "jj", "nn", "vbd", "in", "at", "nn"}); + + } + + @Test + public void testRandomStrings() throws Exception { + checkRandomData(random, analyzer, 10000 * RANDOM_MULTIPLIER); + } + +} Index: modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ae/BasicAEProviderTest.java =================================================================== --- modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ae/BasicAEProviderTest.java (revision 0) +++ modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ae/BasicAEProviderTest.java (revision 0) @@ -0,0 +1,36 @@ +package org.apache.lucene.analysis.uima.ae; + +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import org.apache.uima.analysis_engine.AnalysisEngine; +import org.junit.Test; + +import static org.junit.Assert.assertNotNull; + +/** + * TestCase for {@link BasicAEProvider} + */ +public class BasicAEProviderTest { + + @Test + public void testBasicInititalization() throws Exception { + AEProvider basicAEProvider = new BasicAEProvider("/uima/DummyEntityAEDescriptor.xml"); + AnalysisEngine analysisEngine = basicAEProvider.getAE(); + assertNotNull(analysisEngine); + } +} Index: modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ae/OverridingParamsAEProviderTest.java =================================================================== --- modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ae/OverridingParamsAEProviderTest.java (revision 0) +++ modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ae/OverridingParamsAEProviderTest.java (revision 0) @@ -0,0 +1,61 @@ +package org.apache.lucene.analysis.uima.ae; + +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import org.apache.uima.analysis_engine.AnalysisEngine; +import org.apache.uima.resource.ResourceInitializationException; +import org.junit.Test; + +import java.util.HashMap; +import java.util.Map; + +import static org.junit.Assert.*; + +/** + * TestCase for {@link OverridingParamsAEProvider} + */ +public class OverridingParamsAEProviderTest { + + @Test + public void testNullMapInitialization() throws Exception { + try { + AEProvider aeProvider = new OverridingParamsAEProvider("/uima/DummyEntityAEDescriptor.xml", null); + aeProvider.getAE(); + fail("should fail due to null Map passed"); + } catch (ResourceInitializationException e) { + // everything ok + } + } + + @Test + public void testEmptyMapInitialization() throws Exception { + AEProvider aeProvider = new OverridingParamsAEProvider("/uima/DummyEntityAEDescriptor.xml", new HashMap()); + AnalysisEngine analysisEngine = aeProvider.getAE(); + assertNotNull(analysisEngine); + } + + @Test + public void testOverridingParamsInitialization() throws Exception { + Map runtimeParameters = new HashMap(); + runtimeParameters.put("ngramsize", "3"); + AEProvider aeProvider = new OverridingParamsAEProvider("/uima/AggregateSentenceAE.xml", runtimeParameters); + AnalysisEngine analysisEngine = aeProvider.getAE(); + assertNotNull(analysisEngine); + assertEquals(analysisEngine.getConfigParameterValue("ngramsize"), 3); + } +} Index: modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/SentimentAnnotation.java =================================================================== --- modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/SentimentAnnotation.java (revision 0) +++ modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/SentimentAnnotation.java (revision 0) @@ -0,0 +1,79 @@ + + +/* First created by JCasGen Fri Mar 04 13:08:40 CET 2011 */ +package org.apache.lucene.analysis.uima.ts; + +import org.apache.uima.jcas.JCas; +import org.apache.uima.jcas.JCasRegistry; +import org.apache.uima.jcas.cas.TOP_Type; +import org.apache.uima.jcas.tcas.Annotation; + + +/** + * Updated by JCasGen Fri Mar 04 13:08:40 CET 2011 + * XML source: /Users/tommasoteofili/Documents/workspaces/lucene_workspace/lucene_dev/solr/contrib/uima/src/test/resources/DummySentimentAnalysisAEDescriptor.xml + * @generated */ +public class SentimentAnnotation extends Annotation { + /** @generated + * @ordered + */ + public final static int typeIndexID = JCasRegistry.register(SentimentAnnotation.class); + /** @generated + * @ordered + */ + public final static int type = typeIndexID; + /** @generated */ + public int getTypeIndexID() {return typeIndexID;} + + /** Never called. Disable default constructor + * @generated */ + protected SentimentAnnotation() {} + + /** Internal - constructor used by generator + * @generated */ + public SentimentAnnotation(int addr, TOP_Type type) { + super(addr, type); + readObject(); + } + + /** @generated */ + public SentimentAnnotation(JCas jcas) { + super(jcas); + readObject(); + } + + /** @generated */ + public SentimentAnnotation(JCas jcas, int begin, int end) { + super(jcas); + setBegin(begin); + setEnd(end); + readObject(); + } + + /** + * Write your own initialization here + * + @generated modifiable */ + private void readObject() {} + + + + //*--------------* + //* Feature: mood + + /** getter for mood - gets + * @generated */ + public String getMood() { + if (SentimentAnnotation_Type.featOkTst && ((SentimentAnnotation_Type)jcasType).casFeat_mood == null) + jcasType.jcas.throwFeatMissing("mood", "org.apache.solr.uima.ts.SentimentAnnotation"); + return jcasType.ll_cas.ll_getStringValue(addr, ((SentimentAnnotation_Type)jcasType).casFeatCode_mood);} + + /** setter for mood - sets + * @generated */ + public void setMood(String v) { + if (SentimentAnnotation_Type.featOkTst && ((SentimentAnnotation_Type)jcasType).casFeat_mood == null) + jcasType.jcas.throwFeatMissing("mood", "org.apache.solr.uima.ts.SentimentAnnotation"); + jcasType.ll_cas.ll_setStringValue(addr, ((SentimentAnnotation_Type)jcasType).casFeatCode_mood, v);} + } + + \ No newline at end of file Index: modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/SentimentAnnotation_Type.java =================================================================== --- modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/SentimentAnnotation_Type.java (revision 0) +++ modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/SentimentAnnotation_Type.java (revision 0) @@ -0,0 +1,79 @@ + +/* First created by JCasGen Fri Mar 04 13:08:40 CET 2011 */ +package org.apache.lucene.analysis.uima.ts; + +import org.apache.uima.cas.Feature; +import org.apache.uima.cas.FeatureStructure; +import org.apache.uima.cas.Type; +import org.apache.uima.cas.impl.CASImpl; +import org.apache.uima.cas.impl.FSGenerator; +import org.apache.uima.cas.impl.FeatureImpl; +import org.apache.uima.cas.impl.TypeImpl; +import org.apache.uima.jcas.JCas; +import org.apache.uima.jcas.JCasRegistry; +import org.apache.uima.jcas.tcas.Annotation_Type; + +/** + * Updated by JCasGen Fri Mar 04 13:08:40 CET 2011 + * @generated */ +public class SentimentAnnotation_Type extends Annotation_Type { + /** @generated */ + protected FSGenerator getFSGenerator() {return fsGenerator;} + /** @generated */ + private final FSGenerator fsGenerator = + new FSGenerator() { + public FeatureStructure createFS(int addr, CASImpl cas) { + if (SentimentAnnotation_Type.this.useExistingInstance) { + // Return eq fs instance if already created + FeatureStructure fs = SentimentAnnotation_Type.this.jcas.getJfsFromCaddr(addr); + if (null == fs) { + fs = new SentimentAnnotation(addr, SentimentAnnotation_Type.this); + SentimentAnnotation_Type.this.jcas.putJfsFromCaddr(addr, fs); + return fs; + } + return fs; + } else return new SentimentAnnotation(addr, SentimentAnnotation_Type.this); + } + }; + /** @generated */ + public final static int typeIndexID = SentimentAnnotation.typeIndexID; + /** @generated + @modifiable */ + public final static boolean featOkTst = JCasRegistry.getFeatOkTst("org.apache.solr.uima.ts.SentimentAnnotation"); + + /** @generated */ + final Feature casFeat_mood; + /** @generated */ + final int casFeatCode_mood; + /** @generated */ + public String getMood(int addr) { + if (featOkTst && casFeat_mood == null) + jcas.throwFeatMissing("mood", "org.apache.solr.uima.ts.SentimentAnnotation"); + return ll_cas.ll_getStringValue(addr, casFeatCode_mood); + } + /** @generated */ + public void setMood(int addr, String v) { + if (featOkTst && casFeat_mood == null) + jcas.throwFeatMissing("mood", "org.apache.solr.uima.ts.SentimentAnnotation"); + ll_cas.ll_setStringValue(addr, casFeatCode_mood, v);} + + + + + + /** initialize variables to correspond with Cas Type and Features + * @generated */ + public SentimentAnnotation_Type(JCas jcas, Type casType) { + super(jcas, casType); + casImpl.getFSClassRegistry().addGeneratorForType((TypeImpl)this.casType, getFSGenerator()); + + + casFeat_mood = jcas.getRequiredFeatureDE(casType, "mood", "uima.cas.String", featOkTst); + casFeatCode_mood = (null == casFeat_mood) ? JCas.INVALID_FEATURE_CODE : ((FeatureImpl)casFeat_mood).getCode(); + + } +} + + + + \ No newline at end of file Index: modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/EntityAnnotation.java =================================================================== --- modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/EntityAnnotation.java (revision 0) +++ modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/EntityAnnotation.java (revision 0) @@ -0,0 +1,97 @@ + + +/* First created by JCasGen Sat May 07 22:33:38 JST 2011 */ +package org.apache.lucene.analysis.uima.ts; + +import org.apache.uima.jcas.JCas; +import org.apache.uima.jcas.JCasRegistry; +import org.apache.uima.jcas.cas.TOP_Type; +import org.apache.uima.jcas.tcas.Annotation; + + +/** + * Updated by JCasGen Sat May 07 22:33:38 JST 2011 + * XML source: /Users/koji/Documents/workspace/DummyEntityAnnotator/desc/DummyEntityAEDescriptor.xml + * @generated */ +public class EntityAnnotation extends Annotation { + /** @generated + * @ordered + */ + public final static int typeIndexID = JCasRegistry.register(EntityAnnotation.class); + /** @generated + * @ordered + */ + public final static int type = typeIndexID; + /** @generated */ + public int getTypeIndexID() {return typeIndexID;} + + /** Never called. Disable default constructor + * @generated */ + protected EntityAnnotation() {} + + /** Internal - constructor used by generator + * @generated */ + public EntityAnnotation(int addr, TOP_Type type) { + super(addr, type); + readObject(); + } + + /** @generated */ + public EntityAnnotation(JCas jcas) { + super(jcas); + readObject(); + } + + /** @generated */ + public EntityAnnotation(JCas jcas, int begin, int end) { + super(jcas); + setBegin(begin); + setEnd(end); + readObject(); + } + + /** + * Write your own initialization here + * + @generated modifiable */ + private void readObject() {} + + + + //*--------------* + //* Feature: name + + /** getter for name - gets + * @generated */ + public String getName() { + if (EntityAnnotation_Type.featOkTst && ((EntityAnnotation_Type)jcasType).casFeat_name == null) + jcasType.jcas.throwFeatMissing("name", "org.apache.solr.uima.ts.EntityAnnotation"); + return jcasType.ll_cas.ll_getStringValue(addr, ((EntityAnnotation_Type)jcasType).casFeatCode_name);} + + /** setter for name - sets + * @generated */ + public void setName(String v) { + if (EntityAnnotation_Type.featOkTst && ((EntityAnnotation_Type)jcasType).casFeat_name == null) + jcasType.jcas.throwFeatMissing("name", "org.apache.solr.uima.ts.EntityAnnotation"); + jcasType.ll_cas.ll_setStringValue(addr, ((EntityAnnotation_Type)jcasType).casFeatCode_name, v);} + + + //*--------------* + //* Feature: entity + + /** getter for entity - gets + * @generated */ + public String getEntity() { + if (EntityAnnotation_Type.featOkTst && ((EntityAnnotation_Type)jcasType).casFeat_entity == null) + jcasType.jcas.throwFeatMissing("entity", "org.apache.solr.uima.ts.EntityAnnotation"); + return jcasType.ll_cas.ll_getStringValue(addr, ((EntityAnnotation_Type)jcasType).casFeatCode_entity);} + + /** setter for entity - sets + * @generated */ + public void setEntity(String v) { + if (EntityAnnotation_Type.featOkTst && ((EntityAnnotation_Type)jcasType).casFeat_entity == null) + jcasType.jcas.throwFeatMissing("entity", "org.apache.solr.uima.ts.EntityAnnotation"); + jcasType.ll_cas.ll_setStringValue(addr, ((EntityAnnotation_Type)jcasType).casFeatCode_entity, v);} + } + + \ No newline at end of file Index: modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/EntityAnnotation_Type.java =================================================================== --- modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/EntityAnnotation_Type.java (revision 0) +++ modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/EntityAnnotation_Type.java (revision 0) @@ -0,0 +1,118 @@ + +/* First created by JCasGen Sat May 07 22:33:38 JST 2011 */ +package org.apache.lucene.analysis.uima.ts; + +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import org.apache.uima.cas.Feature; +import org.apache.uima.cas.FeatureStructure; +import org.apache.uima.cas.Type; +import org.apache.uima.cas.impl.CASImpl; +import org.apache.uima.cas.impl.FSGenerator; +import org.apache.uima.cas.impl.FeatureImpl; +import org.apache.uima.cas.impl.TypeImpl; +import org.apache.uima.jcas.JCas; +import org.apache.uima.jcas.JCasRegistry; +import org.apache.uima.jcas.tcas.Annotation_Type; + +/** + * Updated by JCasGen Sat May 07 22:33:38 JST 2011 + * @generated */ +public class EntityAnnotation_Type extends Annotation_Type { + /** @generated */ + protected FSGenerator getFSGenerator() {return fsGenerator;} + /** @generated */ + private final FSGenerator fsGenerator = + new FSGenerator() { + public FeatureStructure createFS(int addr, CASImpl cas) { + if (EntityAnnotation_Type.this.useExistingInstance) { + // Return eq fs instance if already created + FeatureStructure fs = EntityAnnotation_Type.this.jcas.getJfsFromCaddr(addr); + if (null == fs) { + fs = new EntityAnnotation(addr, EntityAnnotation_Type.this); + EntityAnnotation_Type.this.jcas.putJfsFromCaddr(addr, fs); + return fs; + } + return fs; + } else return new EntityAnnotation(addr, EntityAnnotation_Type.this); + } + }; + /** @generated */ + public final static int typeIndexID = EntityAnnotation.typeIndexID; + /** @generated + @modifiable */ + public final static boolean featOkTst = JCasRegistry.getFeatOkTst("org.apache.solr.uima.ts.EntityAnnotation"); + + /** @generated */ + final Feature casFeat_name; + /** @generated */ + final int casFeatCode_name; + /** @generated */ + public String getName(int addr) { + if (featOkTst && casFeat_name == null) + jcas.throwFeatMissing("name", "org.apache.solr.uima.ts.EntityAnnotation"); + return ll_cas.ll_getStringValue(addr, casFeatCode_name); + } + /** @generated */ + public void setName(int addr, String v) { + if (featOkTst && casFeat_name == null) + jcas.throwFeatMissing("name", "org.apache.solr.uima.ts.EntityAnnotation"); + ll_cas.ll_setStringValue(addr, casFeatCode_name, v);} + + + + /** @generated */ + final Feature casFeat_entity; + /** @generated */ + final int casFeatCode_entity; + /** @generated */ + public String getEntity(int addr) { + if (featOkTst && casFeat_entity == null) + jcas.throwFeatMissing("entity", "org.apache.solr.uima.ts.EntityAnnotation"); + return ll_cas.ll_getStringValue(addr, casFeatCode_entity); + } + /** @generated */ + public void setEntity(int addr, String v) { + if (featOkTst && casFeat_entity == null) + jcas.throwFeatMissing("entity", "org.apache.solr.uima.ts.EntityAnnotation"); + ll_cas.ll_setStringValue(addr, casFeatCode_entity, v);} + + + + + + /** initialize variables to correspond with Cas Type and Features + * @generated */ + public EntityAnnotation_Type(JCas jcas, Type casType) { + super(jcas, casType); + casImpl.getFSClassRegistry().addGeneratorForType((TypeImpl)this.casType, getFSGenerator()); + + + casFeat_name = jcas.getRequiredFeatureDE(casType, "name", "uima.cas.String", featOkTst); + casFeatCode_name = (null == casFeat_name) ? JCas.INVALID_FEATURE_CODE : ((FeatureImpl)casFeat_name).getCode(); + + + casFeat_entity = jcas.getRequiredFeatureDE(casType, "entity", "uima.cas.String", featOkTst); + casFeatCode_entity = (null == casFeat_entity) ? JCas.INVALID_FEATURE_CODE : ((FeatureImpl)casFeat_entity).getCode(); + + } +} + + + + \ No newline at end of file Index: modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/UIMABaseAnalyzerTest.java =================================================================== --- modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/UIMABaseAnalyzerTest.java (revision 0) +++ modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/UIMABaseAnalyzerTest.java (revision 0) @@ -0,0 +1,110 @@ +package org.apache.lucene.analysis.uima; + +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import org.apache.lucene.analysis.BaseTokenStreamTestCase; +import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.document.Document; +import org.apache.lucene.document.Field; +import org.apache.lucene.document.TextField; +import org.apache.lucene.index.DirectoryReader; +import org.apache.lucene.index.IndexWriter; +import org.apache.lucene.index.IndexWriterConfig; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.search.MatchAllDocsQuery; +import org.apache.lucene.search.ScoreDoc; +import org.apache.lucene.search.TopDocs; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.RAMDirectory; +import org.apache.lucene.util.Version; +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import java.io.StringReader; + +/** + * Testcase for {@link UIMABaseAnalyzer} + */ +public class UIMABaseAnalyzerTest extends BaseTokenStreamTestCase { + + private UIMABaseAnalyzer analyzer; + + @Before + public void setUp() throws Exception { + super.setUp(); + analyzer = new UIMABaseAnalyzer("/uima/AggregateSentenceAE.xml", "org.apache.uima.TokenAnnotation"); + } + + @After + public void tearDown() throws Exception { + analyzer.close(); + super.tearDown(); + } + + @Test + public void baseUIMAAnalyzerStreamTest() throws Exception { + TokenStream ts = analyzer.tokenStream("text", new StringReader("the big brown fox jumped on the wood")); + assertTokenStreamContents(ts, new String[]{"the", "big", "brown", "fox", "jumped", "on", "the", "wood"}); + } + + @Test + public void baseUIMAAnalyzerIntegrationTest() throws Exception { + Directory dir = new RAMDirectory(); + IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig(Version.LUCENE_40, analyzer)); + // add the first doc + Document doc = new Document(); + doc.add(new Field("title", "this is a dummy title ", TextField.TYPE_STORED)); + doc.add(new Field("contents", "there is some content written here", TextField.TYPE_STORED)); + writer.addDocument(doc, analyzer); + writer.commit(); + + // try the search over the first doc + IndexSearcher indexSearcher = new IndexSearcher(DirectoryReader.open(dir)); + TopDocs result = indexSearcher.search(new MatchAllDocsQuery(), 10); + assertTrue(result.totalHits > 0); + Document d = indexSearcher.doc(result.scoreDocs[0].doc); + assertNotNull(d); + assertNotNull(d.getField("title")); + assertNotNull(d.getField("contents")); + + // add a second doc + doc = new Document(); + doc.add(new Field("title", "dogmas", TextField.TYPE_STORED)); + doc.add(new Field("contents", "white men can't jump", TextField.TYPE_STORED)); + writer.addDocument(doc, analyzer); + writer.commit(); + + // do a matchalldocs query to retrieve both docs + indexSearcher = new IndexSearcher(DirectoryReader.open(dir)); + result = indexSearcher.search(new MatchAllDocsQuery(), 10); + assertTrue(result.totalHits > 0); + for (ScoreDoc di : result.scoreDocs) { + d = indexSearcher.doc(di.doc); + assertNotNull(d); + assertNotNull(d.getField("title")); + assertNotNull(d.getField("contents")); + } + } + + @Test + public void testRandomStrings() throws Exception { + checkRandomData(random, analyzer, 10000 * RANDOM_MULTIPLIER); + } + +} Index: modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/an/DummyEntityAnnotator.java =================================================================== --- modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/an/DummyEntityAnnotator.java (revision 0) +++ modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/an/DummyEntityAnnotator.java (revision 0) @@ -0,0 +1,48 @@ +package org.apache.lucene.analysis.uima.an; + +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import org.apache.lucene.analysis.uima.ts.EntityAnnotation; +import org.apache.uima.TokenAnnotation; +import org.apache.uima.analysis_component.JCasAnnotator_ImplBase; +import org.apache.uima.analysis_engine.AnalysisEngineProcessException; +import org.apache.uima.jcas.JCas; +import org.apache.uima.jcas.tcas.Annotation; + +public class DummyEntityAnnotator extends JCasAnnotator_ImplBase { + + @Override + public void process(JCas jcas) throws AnalysisEngineProcessException { + for (Annotation annotation : jcas.getAnnotationIndex(TokenAnnotation.type)) { + String tokenPOS = ((TokenAnnotation) annotation).getPosTag(); + if ("np".equals(tokenPOS) || "nps".equals(tokenPOS)) { + EntityAnnotation entityAnnotation = new EntityAnnotation(jcas); + entityAnnotation.setBegin(annotation.getBegin()); + entityAnnotation.setEnd(annotation.getEnd()); + String entityString = annotation.getCoveredText(); + entityAnnotation.setEntity(entityString); + String name = "OTHER"; // "OTHER" makes no sense. In practice, "PERSON", "COUNTRY", "E-MAIL", etc. + if (entityString.equals("Apache")) + name = "ORGANIZATION"; + entityAnnotation.setName(name); + entityAnnotation.addToIndexes(); + } + } + } + +} Index: modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/BaseUIMATokenizer.java =================================================================== --- modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/BaseUIMATokenizer.java (revision 0) +++ modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/BaseUIMATokenizer.java (revision 0) @@ -0,0 +1,85 @@ +package org.apache.lucene.analysis.uima; + +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import org.apache.lucene.analysis.Tokenizer; +import org.apache.uima.analysis_engine.AnalysisEngine; +import org.apache.uima.analysis_engine.AnalysisEngineProcessException; +import org.apache.uima.cas.CAS; +import org.apache.uima.cas.FSIterator; +import org.apache.uima.cas.text.AnnotationFS; +import org.apache.uima.resource.ResourceInitializationException; + +import java.io.BufferedReader; +import java.io.IOException; +import java.io.Reader; + +/** + * Abstract base implementation of a {@link Tokenizer} which is able to analyze the given input with a + * UIMA {@link AnalysisEngine} + */ +public abstract class BaseUIMATokenizer extends Tokenizer { + + protected FSIterator iterator; + + protected BaseUIMATokenizer(Reader reader) { + super(reader); + } + + /** + * analyzes the tokenizer input using the given analysis engine + * + * @param analysisEngine the AE to use for analyzing the tokenizer input + * @return CAS with extracted metadata (UIMA annotations, feature structures) + * @throws ResourceInitializationException + * + * @throws AnalysisEngineProcessException + * @throws IOException + */ + protected CAS analyzeInput(AnalysisEngine analysisEngine) throws ResourceInitializationException, + AnalysisEngineProcessException, IOException { + CAS cas = analysisEngine.newCAS(); + cas.setDocumentText(toString(input)); + analysisEngine.process(cas); + analysisEngine.destroy(); + return cas; + } + + private String toString(Reader reader) throws IOException { + BufferedReader bufferedReader = new BufferedReader(reader); + StringBuilder stringBuilder = new StringBuilder(); + String ls = System.getProperty("line.separator"); + String line; + while ((line = bufferedReader.readLine()) != null) { + stringBuilder.append(line); + stringBuilder.append(ls); + } + return stringBuilder.toString(); + } + + @Override + public void reset(Reader input) throws IOException { + super.reset(input); + iterator = null; + } + + @Override + public void end() throws IOException { + iterator = null; + } +} Index: modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMATypeAwareAnalyzer.java =================================================================== --- modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMATypeAwareAnalyzer.java (revision 0) +++ modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMATypeAwareAnalyzer.java (revision 0) @@ -0,0 +1,42 @@ +package org.apache.lucene.analysis.uima; + +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import org.apache.lucene.analysis.Analyzer; + +import java.io.Reader; + +/** + * {@link Analyzer} which uses the {@link UIMATypeAwareAnnotationsTokenizer} for the tokenization phase + */ +public final class UIMATypeAwareAnalyzer extends Analyzer { + private String descriptorPath; + private String tokenType; + private String featurePath; + + public UIMATypeAwareAnalyzer(String descriptorPath, String tokenType, String featurePath) { + this.descriptorPath = descriptorPath; + this.tokenType = tokenType; + this.featurePath = featurePath; + } + + @Override + protected TokenStreamComponents createComponents(String fieldName, Reader reader) { + return new TokenStreamComponents(new UIMATypeAwareAnnotationsTokenizer(descriptorPath, tokenType, featurePath, reader)); + } +} Index: modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMAAnnotationsTokenizer.java =================================================================== --- modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMAAnnotationsTokenizer.java (revision 0) +++ modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMAAnnotationsTokenizer.java (revision 0) @@ -0,0 +1,87 @@ +package org.apache.lucene.analysis.uima; + +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import org.apache.lucene.analysis.Tokenizer; +import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; +import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; +import org.apache.lucene.analysis.uima.ae.AEProviderFactory; +import org.apache.uima.analysis_engine.AnalysisEngine; +import org.apache.uima.analysis_engine.AnalysisEngineProcessException; +import org.apache.uima.cas.CAS; +import org.apache.uima.cas.Type; +import org.apache.uima.cas.text.AnnotationFS; +import org.apache.uima.resource.ResourceInitializationException; +import org.apache.uima.util.InvalidXMLException; + +import java.io.IOException; +import java.io.Reader; + +/** + * a {@link Tokenizer} which creates tokens from UIMA Annotations + */ +public final class UIMAAnnotationsTokenizer extends BaseUIMATokenizer { + + private CharTermAttribute termAttr; + + private OffsetAttribute offsetAttr; + + private String tokenTypeString; + + private String descriptorPath; + + private int finalOffset = 0; + + public UIMAAnnotationsTokenizer(String descriptorPath, String tokenType, Reader input) { + super(input); + this.tokenTypeString = tokenType; + this.termAttr = addAttribute(CharTermAttribute.class); + this.offsetAttr = addAttribute(OffsetAttribute.class); + this.descriptorPath = descriptorPath; + } + + private void analyzeText(String descriptorPath) throws InvalidXMLException, + IOException, ResourceInitializationException, AnalysisEngineProcessException { + AnalysisEngine ae = AEProviderFactory.getInstance().getAEProvider("", descriptorPath).getAE(); + CAS cas = analyzeInput(ae); + Type tokenType = cas.getTypeSystem().getType(tokenTypeString); + iterator = cas.getAnnotationIndex(tokenType).iterator(); + } + + @Override + public boolean incrementToken() throws IOException { + if (iterator == null) { + try { + analyzeText(descriptorPath); + } catch (Exception e) { + throw new IOException(e); + } + } + if (iterator.hasNext()) { + clearAttributes(); + AnnotationFS next = iterator.next(); + termAttr.append(next.getCoveredText()); + offsetAttr.setOffset(correctOffset(next.getBegin()), correctOffset(next.getEnd())); + + return true; + } else { + return false; + } + } + +} Index: modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/AEProvider.java =================================================================== --- modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/AEProvider.java (revision 0) +++ modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/AEProvider.java (revision 0) @@ -0,0 +1,36 @@ +package org.apache.lucene.analysis.uima.ae; + +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import org.apache.uima.analysis_engine.AnalysisEngine; +import org.apache.uima.resource.ResourceInitializationException; + +/** + * provide an Apache UIMA {@link AnalysisEngine} + * + */ +public interface AEProvider { + + /** + * + * @return + * @throws ResourceInitializationException + */ + public AnalysisEngine getAE() throws ResourceInitializationException; + +} Index: modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/BasicAEProvider.java =================================================================== --- modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/BasicAEProvider.java (revision 0) +++ modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/BasicAEProvider.java (revision 0) @@ -0,0 +1,65 @@ +package org.apache.lucene.analysis.uima.ae; + +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import org.apache.uima.UIMAFramework; +import org.apache.uima.analysis_engine.AnalysisEngine; +import org.apache.uima.analysis_engine.AnalysisEngineDescription; +import org.apache.uima.resource.ResourceInitializationException; +import org.apache.uima.util.XMLInputSource; + +import java.lang.annotation.Inherited; +import java.net.URL; + +/** + * Basic {@link AEProvider} which just instantiates a UIMA {@link AnalysisEngine} with no additional metadata, + * parameters or resources + */ +public class BasicAEProvider implements AEProvider { + + private String aePath; + private AnalysisEngine cachedAE; + + public BasicAEProvider(String aePath) { + this.aePath = aePath; + } + + @Override + public synchronized AnalysisEngine getAE() throws ResourceInitializationException { + try { + if (cachedAE == null) { + // get Resource Specifier from XML file + URL url = getClass().getResource(aePath); + XMLInputSource in = new XMLInputSource(url); + + // get AE description + AnalysisEngineDescription desc = UIMAFramework.getXMLParser() + .parseAnalysisEngineDescription(in); + + // create AE here + cachedAE = UIMAFramework.produceAnalysisEngine(desc); + } else { + cachedAE.reconfigure(); + } + } catch (Exception e) { + cachedAE = null; + throw new ResourceInitializationException(e); + } + return cachedAE; + } +} Index: modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/AEProviderFactory.java =================================================================== --- modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/AEProviderFactory.java (revision 0) +++ modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/AEProviderFactory.java (revision 0) @@ -0,0 +1,73 @@ +package org.apache.lucene.analysis.uima.ae; + +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import java.util.HashMap; +import java.util.Map; + +/** + * Singleton factory class responsible of {@link AEProvider}s' creation + * + */ +public class AEProviderFactory { + + private static AEProviderFactory instance; + + private Map providerCache = new HashMap(); + + private AEProviderFactory() { + // Singleton + } + + public static AEProviderFactory getInstance() { + if (instance == null) { + instance = new AEProviderFactory(); + } + return instance; + } + + /** + * + * @param keyPrefix + * @param aePath + * @return + */ + public synchronized AEProvider getAEProvider(String keyPrefix, String aePath) { + String key = new StringBuilder(keyPrefix).append(aePath).append(BasicAEProvider.class).toString(); + if (providerCache.get(key) == null) { + providerCache.put(key, new BasicAEProvider(aePath)); + } + return providerCache.get(key); + } + + /** + * + * @param keyPrefix + * @param aePath + * @param runtimeParameters + * @return + */ + public synchronized AEProvider getAEProvider(String keyPrefix, String aePath, + Map runtimeParameters) { + String key = new StringBuilder(keyPrefix).append(aePath).append(OverridingParamsAEProvider.class).toString(); + if (providerCache.get(key) == null) { + providerCache.put(key, new OverridingParamsAEProvider(aePath, runtimeParameters)); + } + return providerCache.get(key); + } +} Index: modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/OverridingParamsAEProvider.java =================================================================== --- modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/OverridingParamsAEProvider.java (revision 0) +++ modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/OverridingParamsAEProvider.java (revision 0) @@ -0,0 +1,104 @@ +package org.apache.lucene.analysis.uima.ae; + +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import org.apache.uima.UIMAFramework; +import org.apache.uima.analysis_engine.AnalysisEngine; +import org.apache.uima.analysis_engine.AnalysisEngineDescription; +import org.apache.uima.resource.ResourceInitializationException; +import org.apache.uima.util.XMLInputSource; + +import java.net.URL; +import java.util.Map; + +/** + * {@link AEProvider} implementation that creates an Aggregate AE from the given path, also + * injecting runtime parameters defined in the solrconfig.xml Solr configuration file and assigning + * them as overriding parameters in the aggregate AE + * + */ +public class OverridingParamsAEProvider implements AEProvider { + + private String aeFilePath; + + private AnalysisEngine cachedAE; + + private Map runtimeParameters; + + public OverridingParamsAEProvider(String aeFilePath, Map runtimeParameters) { + this.aeFilePath = aeFilePath; + this.runtimeParameters = runtimeParameters; + } + + @Override + public synchronized AnalysisEngine getAE() throws ResourceInitializationException { + try { + if (cachedAE == null) { + // get Resource Specifier from XML file + URL url = this.getClass().getResource(aeFilePath); + XMLInputSource in = new XMLInputSource(url); + + // get AE description + AnalysisEngineDescription desc = UIMAFramework.getXMLParser() + .parseAnalysisEngineDescription(in); + + /* iterate over each AE (to set runtime parameters) */ + for (String attributeName : runtimeParameters.keySet()) { + Object val = getRuntimeValue(desc, attributeName); + desc.getAnalysisEngineMetaData().getConfigurationParameterSettings().setParameterValue( + attributeName, val); + } + // create AE here + cachedAE = UIMAFramework.produceAnalysisEngine(desc); + } else { + cachedAE.reconfigure(); + } + } catch (Exception e) { + cachedAE = null; + throw new ResourceInitializationException(e); + } + return cachedAE; + } + + /* create the value to inject in the runtime parameter depending on its declared type */ + private Object getRuntimeValue(AnalysisEngineDescription desc, String attributeName) + throws ClassNotFoundException { + String type = desc.getAnalysisEngineMetaData().getConfigurationParameterDeclarations(). + getConfigurationParameter(null, attributeName).getType(); + // TODO : do it via reflection ? i.e. Class paramType = Class.forName(type)... + Object val = null; + Object runtimeValue = runtimeParameters.get(attributeName); + if (runtimeValue!=null) { + if ("String".equals(type)) { + val = String.valueOf(runtimeValue); + } + else if ("Integer".equals(type)) { + val = Integer.valueOf(runtimeValue.toString()); + } + else if ("Boolean".equals(type)) { + val = Boolean.valueOf(runtimeValue.toString()); + } + else if ("Float".equals(type)) { + val = Float.valueOf(runtimeValue.toString()); + } + } + + return val; + } + +} \ No newline at end of file Index: modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMATypeAwareAnnotationsTokenizer.java =================================================================== --- modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMATypeAwareAnnotationsTokenizer.java (revision 0) +++ modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMATypeAwareAnnotationsTokenizer.java (revision 0) @@ -0,0 +1,100 @@ +package org.apache.lucene.analysis.uima; + +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import org.apache.lucene.analysis.Tokenizer; +import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; +import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; +import org.apache.lucene.analysis.tokenattributes.TypeAttribute; +import org.apache.lucene.analysis.uima.ae.AEProviderFactory; +import org.apache.uima.analysis_engine.AnalysisEngine; +import org.apache.uima.analysis_engine.AnalysisEngineProcessException; +import org.apache.uima.cas.CAS; +import org.apache.uima.cas.CASException; +import org.apache.uima.cas.FeaturePath; +import org.apache.uima.cas.Type; +import org.apache.uima.cas.text.AnnotationFS; +import org.apache.uima.resource.ResourceInitializationException; +import org.apache.uima.util.InvalidXMLException; + +import java.io.IOException; +import java.io.Reader; + +/** + * A {@link Tokenizer} which creates token from UIMA Annotations filling also their {@link TypeAttribute} according to + * {@link org.apache.uima.cas.FeaturePath}s specified + */ +public final class UIMATypeAwareAnnotationsTokenizer extends BaseUIMATokenizer { + + private TypeAttribute typeAttr; + + private CharTermAttribute termAttr; + + private OffsetAttribute offsetAttr; + + private String tokenTypeString; + + private String descriptorPath; + + private String typeAttributeFeaturePath; + + private FeaturePath featurePath; + + public UIMATypeAwareAnnotationsTokenizer(String descriptorPath, String tokenType, String typeAttributeFeaturePath, Reader input) { + super(input); + this.tokenTypeString = tokenType; + this.termAttr = addAttribute(CharTermAttribute.class); + this.typeAttr = addAttribute(TypeAttribute.class); + this.offsetAttr = addAttribute(OffsetAttribute.class); + this.typeAttributeFeaturePath = typeAttributeFeaturePath; + this.descriptorPath = descriptorPath; + } + + private void analyzeText() throws InvalidXMLException, + IOException, ResourceInitializationException, AnalysisEngineProcessException, CASException { + AnalysisEngine ae = AEProviderFactory.getInstance().getAEProvider("", descriptorPath).getAE(); + CAS cas = analyzeInput(ae); + Type tokenType = cas.getTypeSystem().getType(tokenTypeString); + iterator = cas.getAnnotationIndex(tokenType).iterator(); + featurePath = cas.createFeaturePath(); + featurePath.initialize(typeAttributeFeaturePath); + } + + @Override + public boolean incrementToken() throws IOException { + if (iterator == null) { + try { + analyzeText(); + } catch (Exception e) { + throw new IOException(e); + } + } + if (iterator.hasNext()) { + clearAttributes(); + AnnotationFS next = iterator.next(); + termAttr.append(next.getCoveredText()); + offsetAttr.setOffset(correctOffset(next.getBegin()), correctOffset(next.getEnd())); + typeAttr.setType(featurePath.getValueAsString(next)); + return true; + } else { + return false; + } + } + + +} Index: modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMABaseAnalyzer.java =================================================================== --- modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMABaseAnalyzer.java (revision 0) +++ modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMABaseAnalyzer.java (revision 0) @@ -0,0 +1,42 @@ +package org.apache.lucene.analysis.uima; + +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import org.apache.lucene.analysis.Analyzer; + +import java.io.Reader; + +/** + * An {@link Analyzer} which use the {@link UIMAAnnotationsTokenizer} for creating tokens + */ +public final class UIMABaseAnalyzer extends Analyzer { + + private String descriptorPath; + private String tokenType; + + public UIMABaseAnalyzer(String descriptorPath, String tokenType) { + this.descriptorPath = descriptorPath; + this.tokenType = tokenType; + } + + @Override + protected TokenStreamComponents createComponents(String fieldName, Reader reader) { + return new TokenStreamComponents(new UIMAAnnotationsTokenizer(descriptorPath, tokenType, reader)); + } + +} Index: modules/analysis/uima/src/resources/uima/AggregateSentenceAE.xml =================================================================== --- modules/analysis/uima/src/resources/uima/AggregateSentenceAE.xml (revision 0) +++ modules/analysis/uima/src/resources/uima/AggregateSentenceAE.xml (revision 0) @@ -0,0 +1,70 @@ + + + + org.apache.uima.java + false + + + + + + + + + + AggregateSentenceAE + + 1.0 + + + + ngramsize + Integer + false + false + + HmmTagger/NGRAM_SIZE + + + + + + + WhitespaceTokenizer + HmmTagger + + + + + + + + org.apache.uima.SentenceAnnotation + org.apache.uima.TokenAnnotation + + + + + + true + true + false + + + + Index: modules/analysis/uima/src/resources/uima/DummyEntityAEDescriptor.xml =================================================================== --- modules/analysis/uima/src/resources/uima/DummyEntityAEDescriptor.xml (revision 0) +++ modules/analysis/uima/src/resources/uima/DummyEntityAEDescriptor.xml (revision 0) @@ -0,0 +1,68 @@ + + + + org.apache.uima.java + true + org.apache.lucene.analysis.uima.an.DummyEntityAnnotator + + DummyEntityAEDescriptor + + 1.0 + ASF + + + + + + org.apache.solr.uima.ts.EntityAnnotation + + uima.tcas.Annotation + + + name + + uima.cas.String + + + entity + + uima.cas.String + + + + + + + + + + + + org.apache.solr.uima.ts.EntityAnnotation + + + + + + true + true + false + + + + Index: modules/analysis/uima/build.xml =================================================================== --- modules/analysis/uima/build.xml (revision 0) +++ modules/analysis/uima/build.xml (revision 0) @@ -0,0 +1,43 @@ + + + + + + + + UIMA Analysis module + + + + + + + + + + + + + + + + + + + + Index: modules/analysis/build.xml =================================================================== --- modules/analysis/build.xml (revision 1242237) +++ modules/analysis/build.xml (working copy) @@ -27,6 +27,7 @@ - morfologik: Morfologik Stemmer - smartcn: Smart Analyzer for Simplified Chinese Text - stempel: Algorithmic Stemmer for Polish + - uima: UIMA Analysis module @@ -57,8 +58,12 @@ + + + + - + @@ -68,6 +73,7 @@ + @@ -77,6 +83,7 @@ + @@ -86,6 +93,7 @@ + @@ -95,6 +103,7 @@ + @@ -104,6 +113,7 @@ + @@ -116,6 +126,7 @@ + @@ -126,6 +137,7 @@ + @@ -136,6 +148,7 @@ + Index: lucene/contrib/contrib-build.xml =================================================================== --- lucene/contrib/contrib-build.xml (revision 1242237) +++ lucene/contrib/contrib-build.xml (working copy) @@ -162,6 +162,17 @@ + + + + + + + + + + + Index: dev-tools/idea/solr/contrib/uima/uima-solr.iml =================================================================== --- dev-tools/idea/solr/contrib/uima/uima-solr.iml (revision 0) +++ dev-tools/idea/solr/contrib/uima/uima-solr.iml (revision 0) @@ -0,0 +1,29 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Index: dev-tools/idea/solr/contrib/uima/uima.iml =================================================================== --- dev-tools/idea/solr/contrib/uima/uima.iml (revision 1242237) +++ dev-tools/idea/solr/contrib/uima/uima.iml (working copy) @@ -1,29 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Index: dev-tools/idea/modules/analysis/uima/uima-analysis.iml =================================================================== --- dev-tools/idea/modules/analysis/uima/uima-analysis.iml (revision 0) +++ dev-tools/idea/modules/analysis/uima/uima-analysis.iml (revision 0) @@ -0,0 +1,27 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + Index: dev-tools/maven/solr/contrib/uima/pom.xml.template =================================================================== --- dev-tools/maven/solr/contrib/uima/pom.xml.template (revision 1242237) +++ dev-tools/maven/solr/contrib/uima/pom.xml.template (working copy) @@ -84,6 +84,11 @@ org.apache.uima uimaj-core + + org.apache.lucene + lucene-analyzers-uima + ${project.version} + ${build-directory} Index: dev-tools/maven/modules/analysis/uima/pom.xml.template =================================================================== --- dev-tools/maven/modules/analysis/uima/pom.xml.template (revision 0) +++ dev-tools/maven/modules/analysis/uima/pom.xml.template (revision 0) @@ -0,0 +1,89 @@ + + + 4.0.0 + + org.apache.lucene + lucene-parent + @version@ + ../../../lucene/pom.xml + + org.apache.lucene + lucene-analyzers-uima + jar + Lucene UIMA analysis library + + An Apache UIMA enabled set of tokenizers/analyzers + + + modules/analysis/uima + ../build/uima + + + + + ${project.groupId} + lucene-test-framework + ${project.version} + test + + + ${project.groupId} + lucene-core + ${project.version} + + + ${project.groupId} + lucene-analyzers-common + ${project.version} + + + org.apache.uima + uimaj-core + 2.3.1 + + + org.apache.uima + Tagger + 2.3.1 + + + org.apache.uima + WhitespaceTokenizer + 2.3.1 + + + + ${build-directory} + ${build-directory}/classes/java + ${build-directory}/classes/test + src/java + src/test + + + ${project.build.testSourceDirectory} + + **/*.java + + + + +