Index: lucene/contrib/contrib-build.xml
===================================================================
--- lucene/contrib/contrib-build.xml (revision 1240026)
+++ lucene/contrib/contrib-build.xml (working copy)
@@ -207,6 +207,17 @@
+
+
+
+
+
+
+
+
+
+
+
Index: modules/analysis/README.txt
===================================================================
--- modules/analysis/README.txt (revision 1240026)
+++ modules/analysis/README.txt (working copy)
@@ -41,6 +41,10 @@
An add-on analysis library that contains a universal algorithmic stemmer,
including tables for the Polish language.
+lucene-analyzers-uima-XX.jar
+ An add-on analysis library that contains tokenizers/analyzers using
+ Apache UIMA extracted annotations to identify tokens/types/etc.
+
common/src/java
icu/src/java
kuromoji/src/java
@@ -48,6 +52,7 @@
phonetic/src/java
smartcn/src/java
stempel/src/java
+uima/src/java
The source code for the libraries.
common/src/test
@@ -57,4 +62,5 @@
phonetic/src/test
smartcn/src/test
stempel/src/test
+uima/src/test
Unit tests for the libraries.
Property changes on: modules/analysis/uima
___________________________________________________________________
Added: svn:ignore
+ *.iml
Index: modules/analysis/uima/lib/uima-an-wst-NOTICE.txt
===================================================================
--- modules/analysis/uima/lib/uima-an-wst-NOTICE.txt (revision 0)
+++ modules/analysis/uima/lib/uima-an-wst-NOTICE.txt (revision 0)
@@ -0,0 +1,7 @@
+
+UIMA Annotator: WhitespaceTokenizer
+Copyright 2006-2010 The Apache Software Foundation
+
+This product includes software developed at
+The Apache Software Foundation (http://www.apache.org/).
+
Index: modules/analysis/uima/lib/uimaj-core-2.3.1.jar
===================================================================
Cannot display: file marked as a binary type.
svn:mime-type = application/octet-stream
Property changes on: modules/analysis/uima/lib/uimaj-core-2.3.1.jar
___________________________________________________________________
Added: svn:mime-type
+ application/octet-stream
Index: modules/analysis/uima/lib/uimaj-core-LICENSE-ASL.txt
===================================================================
--- modules/analysis/uima/lib/uimaj-core-LICENSE-ASL.txt (revision 0)
+++ modules/analysis/uima/lib/uimaj-core-LICENSE-ASL.txt (revision 0)
@@ -0,0 +1,202 @@
+
+ Apache License
+ Version 2.0, January 2004
+ http://www.apache.org/licenses/
+
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+ 1. Definitions.
+
+ "License" shall mean the terms and conditions for use, reproduction,
+ and distribution as defined by Sections 1 through 9 of this document.
+
+ "Licensor" shall mean the copyright owner or entity authorized by
+ the copyright owner that is granting the License.
+
+ "Legal Entity" shall mean the union of the acting entity and all
+ other entities that control, are controlled by, or are under common
+ control with that entity. For the purposes of this definition,
+ "control" means (i) the power, direct or indirect, to cause the
+ direction or management of such entity, whether by contract or
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
+ outstanding shares, or (iii) beneficial ownership of such entity.
+
+ "You" (or "Your") shall mean an individual or Legal Entity
+ exercising permissions granted by this License.
+
+ "Source" form shall mean the preferred form for making modifications,
+ including but not limited to software source code, documentation
+ source, and configuration files.
+
+ "Object" form shall mean any form resulting from mechanical
+ transformation or translation of a Source form, including but
+ not limited to compiled object code, generated documentation,
+ and conversions to other media types.
+
+ "Work" shall mean the work of authorship, whether in Source or
+ Object form, made available under the License, as indicated by a
+ copyright notice that is included in or attached to the work
+ (an example is provided in the Appendix below).
+
+ "Derivative Works" shall mean any work, whether in Source or Object
+ form, that is based on (or derived from) the Work and for which the
+ editorial revisions, annotations, elaborations, or other modifications
+ represent, as a whole, an original work of authorship. For the purposes
+ of this License, Derivative Works shall not include works that remain
+ separable from, or merely link (or bind by name) to the interfaces of,
+ the Work and Derivative Works thereof.
+
+ "Contribution" shall mean any work of authorship, including
+ the original version of the Work and any modifications or additions
+ to that Work or Derivative Works thereof, that is intentionally
+ submitted to Licensor for inclusion in the Work by the copyright owner
+ or by an individual or Legal Entity authorized to submit on behalf of
+ the copyright owner. For the purposes of this definition, "submitted"
+ means any form of electronic, verbal, or written communication sent
+ to the Licensor or its representatives, including but not limited to
+ communication on electronic mailing lists, source code control systems,
+ and issue tracking systems that are managed by, or on behalf of, the
+ Licensor for the purpose of discussing and improving the Work, but
+ excluding communication that is conspicuously marked or otherwise
+ designated in writing by the copyright owner as "Not a Contribution."
+
+ "Contributor" shall mean Licensor and any individual or Legal Entity
+ on behalf of whom a Contribution has been received by Licensor and
+ subsequently incorporated within the Work.
+
+ 2. Grant of Copyright License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ copyright license to reproduce, prepare Derivative Works of,
+ publicly display, publicly perform, sublicense, and distribute the
+ Work and such Derivative Works in Source or Object form.
+
+ 3. Grant of Patent License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ (except as stated in this section) patent license to make, have made,
+ use, offer to sell, sell, import, and otherwise transfer the Work,
+ where such license applies only to those patent claims licensable
+ by such Contributor that are necessarily infringed by their
+ Contribution(s) alone or by combination of their Contribution(s)
+ with the Work to which such Contribution(s) was submitted. If You
+ institute patent litigation against any entity (including a
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
+ or a Contribution incorporated within the Work constitutes direct
+ or contributory patent infringement, then any patent licenses
+ granted to You under this License for that Work shall terminate
+ as of the date such litigation is filed.
+
+ 4. Redistribution. You may reproduce and distribute copies of the
+ Work or Derivative Works thereof in any medium, with or without
+ modifications, and in Source or Object form, provided that You
+ meet the following conditions:
+
+ (a) You must give any other recipients of the Work or
+ Derivative Works a copy of this License; and
+
+ (b) You must cause any modified files to carry prominent notices
+ stating that You changed the files; and
+
+ (c) You must retain, in the Source form of any Derivative Works
+ that You distribute, all copyright, patent, trademark, and
+ attribution notices from the Source form of the Work,
+ excluding those notices that do not pertain to any part of
+ the Derivative Works; and
+
+ (d) If the Work includes a "NOTICE" text file as part of its
+ distribution, then any Derivative Works that You distribute must
+ include a readable copy of the attribution notices contained
+ within such NOTICE file, excluding those notices that do not
+ pertain to any part of the Derivative Works, in at least one
+ of the following places: within a NOTICE text file distributed
+ as part of the Derivative Works; within the Source form or
+ documentation, if provided along with the Derivative Works; or,
+ within a display generated by the Derivative Works, if and
+ wherever such third-party notices normally appear. The contents
+ of the NOTICE file are for informational purposes only and
+ do not modify the License. You may add Your own attribution
+ notices within Derivative Works that You distribute, alongside
+ or as an addendum to the NOTICE text from the Work, provided
+ that such additional attribution notices cannot be construed
+ as modifying the License.
+
+ You may add Your own copyright statement to Your modifications and
+ may provide additional or different license terms and conditions
+ for use, reproduction, or distribution of Your modifications, or
+ for any such Derivative Works as a whole, provided Your use,
+ reproduction, and distribution of the Work otherwise complies with
+ the conditions stated in this License.
+
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
+ any Contribution intentionally submitted for inclusion in the Work
+ by You to the Licensor shall be under the terms and conditions of
+ this License, without any additional terms or conditions.
+ Notwithstanding the above, nothing herein shall supersede or modify
+ the terms of any separate license agreement you may have executed
+ with Licensor regarding such Contributions.
+
+ 6. Trademarks. This License does not grant permission to use the trade
+ names, trademarks, service marks, or product names of the Licensor,
+ except as required for reasonable and customary use in describing the
+ origin of the Work and reproducing the content of the NOTICE file.
+
+ 7. Disclaimer of Warranty. Unless required by applicable law or
+ agreed to in writing, Licensor provides the Work (and each
+ Contributor provides its Contributions) on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+ implied, including, without limitation, any warranties or conditions
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+ PARTICULAR PURPOSE. You are solely responsible for determining the
+ appropriateness of using or redistributing the Work and assume any
+ risks associated with Your exercise of permissions under this License.
+
+ 8. Limitation of Liability. In no event and under no legal theory,
+ whether in tort (including negligence), contract, or otherwise,
+ unless required by applicable law (such as deliberate and grossly
+ negligent acts) or agreed to in writing, shall any Contributor be
+ liable to You for damages, including any direct, indirect, special,
+ incidental, or consequential damages of any character arising as a
+ result of this License or out of the use or inability to use the
+ Work (including but not limited to damages for loss of goodwill,
+ work stoppage, computer failure or malfunction, or any and all
+ other commercial damages or losses), even if such Contributor
+ has been advised of the possibility of such damages.
+
+ 9. Accepting Warranty or Additional Liability. While redistributing
+ the Work or Derivative Works thereof, You may choose to offer,
+ and charge a fee for, acceptance of support, warranty, indemnity,
+ or other liability obligations and/or rights consistent with this
+ License. However, in accepting such obligations, You may act only
+ on Your own behalf and on Your sole responsibility, not on behalf
+ of any other Contributor, and only if You agree to indemnify,
+ defend, and hold each Contributor harmless for any liability
+ incurred by, or claims asserted against, such Contributor by reason
+ of your accepting any such warranty or additional liability.
+
+ END OF TERMS AND CONDITIONS
+
+ APPENDIX: How to apply the Apache License to your work.
+
+ To apply the Apache License to your work, attach the following
+ boilerplate notice, with the fields enclosed by brackets "[]"
+ replaced with your own identifying information. (Don't include
+ the brackets!) The text should be enclosed in the appropriate
+ comment syntax for the file format. We also recommend that a
+ file or class name and description of purpose be included on the
+ same "printed page" as the copyright notice for easier
+ identification within third-party archives.
+
+ Copyright [yyyy] [name of copyright owner]
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
Index: modules/analysis/uima/lib/uima-an-tagger-2.3.1.jar
===================================================================
Cannot display: file marked as a binary type.
svn:mime-type = application/octet-stream
Property changes on: modules/analysis/uima/lib/uima-an-tagger-2.3.1.jar
___________________________________________________________________
Added: svn:mime-type
+ application/octet-stream
Index: modules/analysis/uima/lib/uimaj-core-NOTICE.txt
===================================================================
--- modules/analysis/uima/lib/uimaj-core-NOTICE.txt (revision 0)
+++ modules/analysis/uima/lib/uimaj-core-NOTICE.txt (revision 0)
@@ -0,0 +1,13 @@
+
+UIMA Base: uimaj-core
+Copyright 2006-2010 The Apache Software Foundation
+
+This product includes software developed at
+The Apache Software Foundation (http://www.apache.org/).
+
+Portions of Apache UIMA were originally developed by
+International Business Machines Corporation and are
+licensed to the Apache Software Foundation under the
+"Software Grant License Agreement", informally known as the
+"IBM UIMA License Agreement".
+Copyright (c) 2003, 2006 IBM Corporation.
Index: modules/analysis/uima/lib/uima-an-tagger-LICENSE-ASL.txt
===================================================================
--- modules/analysis/uima/lib/uima-an-tagger-LICENSE-ASL.txt (revision 0)
+++ modules/analysis/uima/lib/uima-an-tagger-LICENSE-ASL.txt (revision 0)
@@ -0,0 +1,202 @@
+
+ Apache License
+ Version 2.0, January 2004
+ http://www.apache.org/licenses/
+
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+ 1. Definitions.
+
+ "License" shall mean the terms and conditions for use, reproduction,
+ and distribution as defined by Sections 1 through 9 of this document.
+
+ "Licensor" shall mean the copyright owner or entity authorized by
+ the copyright owner that is granting the License.
+
+ "Legal Entity" shall mean the union of the acting entity and all
+ other entities that control, are controlled by, or are under common
+ control with that entity. For the purposes of this definition,
+ "control" means (i) the power, direct or indirect, to cause the
+ direction or management of such entity, whether by contract or
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
+ outstanding shares, or (iii) beneficial ownership of such entity.
+
+ "You" (or "Your") shall mean an individual or Legal Entity
+ exercising permissions granted by this License.
+
+ "Source" form shall mean the preferred form for making modifications,
+ including but not limited to software source code, documentation
+ source, and configuration files.
+
+ "Object" form shall mean any form resulting from mechanical
+ transformation or translation of a Source form, including but
+ not limited to compiled object code, generated documentation,
+ and conversions to other media types.
+
+ "Work" shall mean the work of authorship, whether in Source or
+ Object form, made available under the License, as indicated by a
+ copyright notice that is included in or attached to the work
+ (an example is provided in the Appendix below).
+
+ "Derivative Works" shall mean any work, whether in Source or Object
+ form, that is based on (or derived from) the Work and for which the
+ editorial revisions, annotations, elaborations, or other modifications
+ represent, as a whole, an original work of authorship. For the purposes
+ of this License, Derivative Works shall not include works that remain
+ separable from, or merely link (or bind by name) to the interfaces of,
+ the Work and Derivative Works thereof.
+
+ "Contribution" shall mean any work of authorship, including
+ the original version of the Work and any modifications or additions
+ to that Work or Derivative Works thereof, that is intentionally
+ submitted to Licensor for inclusion in the Work by the copyright owner
+ or by an individual or Legal Entity authorized to submit on behalf of
+ the copyright owner. For the purposes of this definition, "submitted"
+ means any form of electronic, verbal, or written communication sent
+ to the Licensor or its representatives, including but not limited to
+ communication on electronic mailing lists, source code control systems,
+ and issue tracking systems that are managed by, or on behalf of, the
+ Licensor for the purpose of discussing and improving the Work, but
+ excluding communication that is conspicuously marked or otherwise
+ designated in writing by the copyright owner as "Not a Contribution."
+
+ "Contributor" shall mean Licensor and any individual or Legal Entity
+ on behalf of whom a Contribution has been received by Licensor and
+ subsequently incorporated within the Work.
+
+ 2. Grant of Copyright License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ copyright license to reproduce, prepare Derivative Works of,
+ publicly display, publicly perform, sublicense, and distribute the
+ Work and such Derivative Works in Source or Object form.
+
+ 3. Grant of Patent License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ (except as stated in this section) patent license to make, have made,
+ use, offer to sell, sell, import, and otherwise transfer the Work,
+ where such license applies only to those patent claims licensable
+ by such Contributor that are necessarily infringed by their
+ Contribution(s) alone or by combination of their Contribution(s)
+ with the Work to which such Contribution(s) was submitted. If You
+ institute patent litigation against any entity (including a
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
+ or a Contribution incorporated within the Work constitutes direct
+ or contributory patent infringement, then any patent licenses
+ granted to You under this License for that Work shall terminate
+ as of the date such litigation is filed.
+
+ 4. Redistribution. You may reproduce and distribute copies of the
+ Work or Derivative Works thereof in any medium, with or without
+ modifications, and in Source or Object form, provided that You
+ meet the following conditions:
+
+ (a) You must give any other recipients of the Work or
+ Derivative Works a copy of this License; and
+
+ (b) You must cause any modified files to carry prominent notices
+ stating that You changed the files; and
+
+ (c) You must retain, in the Source form of any Derivative Works
+ that You distribute, all copyright, patent, trademark, and
+ attribution notices from the Source form of the Work,
+ excluding those notices that do not pertain to any part of
+ the Derivative Works; and
+
+ (d) If the Work includes a "NOTICE" text file as part of its
+ distribution, then any Derivative Works that You distribute must
+ include a readable copy of the attribution notices contained
+ within such NOTICE file, excluding those notices that do not
+ pertain to any part of the Derivative Works, in at least one
+ of the following places: within a NOTICE text file distributed
+ as part of the Derivative Works; within the Source form or
+ documentation, if provided along with the Derivative Works; or,
+ within a display generated by the Derivative Works, if and
+ wherever such third-party notices normally appear. The contents
+ of the NOTICE file are for informational purposes only and
+ do not modify the License. You may add Your own attribution
+ notices within Derivative Works that You distribute, alongside
+ or as an addendum to the NOTICE text from the Work, provided
+ that such additional attribution notices cannot be construed
+ as modifying the License.
+
+ You may add Your own copyright statement to Your modifications and
+ may provide additional or different license terms and conditions
+ for use, reproduction, or distribution of Your modifications, or
+ for any such Derivative Works as a whole, provided Your use,
+ reproduction, and distribution of the Work otherwise complies with
+ the conditions stated in this License.
+
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
+ any Contribution intentionally submitted for inclusion in the Work
+ by You to the Licensor shall be under the terms and conditions of
+ this License, without any additional terms or conditions.
+ Notwithstanding the above, nothing herein shall supersede or modify
+ the terms of any separate license agreement you may have executed
+ with Licensor regarding such Contributions.
+
+ 6. Trademarks. This License does not grant permission to use the trade
+ names, trademarks, service marks, or product names of the Licensor,
+ except as required for reasonable and customary use in describing the
+ origin of the Work and reproducing the content of the NOTICE file.
+
+ 7. Disclaimer of Warranty. Unless required by applicable law or
+ agreed to in writing, Licensor provides the Work (and each
+ Contributor provides its Contributions) on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+ implied, including, without limitation, any warranties or conditions
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+ PARTICULAR PURPOSE. You are solely responsible for determining the
+ appropriateness of using or redistributing the Work and assume any
+ risks associated with Your exercise of permissions under this License.
+
+ 8. Limitation of Liability. In no event and under no legal theory,
+ whether in tort (including negligence), contract, or otherwise,
+ unless required by applicable law (such as deliberate and grossly
+ negligent acts) or agreed to in writing, shall any Contributor be
+ liable to You for damages, including any direct, indirect, special,
+ incidental, or consequential damages of any character arising as a
+ result of this License or out of the use or inability to use the
+ Work (including but not limited to damages for loss of goodwill,
+ work stoppage, computer failure or malfunction, or any and all
+ other commercial damages or losses), even if such Contributor
+ has been advised of the possibility of such damages.
+
+ 9. Accepting Warranty or Additional Liability. While redistributing
+ the Work or Derivative Works thereof, You may choose to offer,
+ and charge a fee for, acceptance of support, warranty, indemnity,
+ or other liability obligations and/or rights consistent with this
+ License. However, in accepting such obligations, You may act only
+ on Your own behalf and on Your sole responsibility, not on behalf
+ of any other Contributor, and only if You agree to indemnify,
+ defend, and hold each Contributor harmless for any liability
+ incurred by, or claims asserted against, such Contributor by reason
+ of your accepting any such warranty or additional liability.
+
+ END OF TERMS AND CONDITIONS
+
+ APPENDIX: How to apply the Apache License to your work.
+
+ To apply the Apache License to your work, attach the following
+ boilerplate notice, with the fields enclosed by brackets "[]"
+ replaced with your own identifying information. (Don't include
+ the brackets!) The text should be enclosed in the appropriate
+ comment syntax for the file format. We also recommend that a
+ file or class name and description of purpose be included on the
+ same "printed page" as the copyright notice for easier
+ identification within third-party archives.
+
+ Copyright [yyyy] [name of copyright owner]
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
Index: modules/analysis/uima/lib/uima-an-tagger-NOTICE.txt
===================================================================
--- modules/analysis/uima/lib/uima-an-tagger-NOTICE.txt (revision 0)
+++ modules/analysis/uima/lib/uima-an-tagger-NOTICE.txt (revision 0)
@@ -0,0 +1,7 @@
+
+UIMA Annotator: Tagger
+Copyright 2006-2010 The Apache Software Foundation
+
+This product includes software developed at
+The Apache Software Foundation (http://www.apache.org/).
+
Index: modules/analysis/uima/lib/uima-an-wst-2.3.1.jar
===================================================================
Cannot display: file marked as a binary type.
svn:mime-type = application/octet-stream
Property changes on: modules/analysis/uima/lib/uima-an-wst-2.3.1.jar
___________________________________________________________________
Added: svn:mime-type
+ application/octet-stream
Index: modules/analysis/uima/lib/uima-an-wst-LICENSE-ASL.txt
===================================================================
--- modules/analysis/uima/lib/uima-an-wst-LICENSE-ASL.txt (revision 0)
+++ modules/analysis/uima/lib/uima-an-wst-LICENSE-ASL.txt (revision 0)
@@ -0,0 +1,202 @@
+
+ Apache License
+ Version 2.0, January 2004
+ http://www.apache.org/licenses/
+
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+ 1. Definitions.
+
+ "License" shall mean the terms and conditions for use, reproduction,
+ and distribution as defined by Sections 1 through 9 of this document.
+
+ "Licensor" shall mean the copyright owner or entity authorized by
+ the copyright owner that is granting the License.
+
+ "Legal Entity" shall mean the union of the acting entity and all
+ other entities that control, are controlled by, or are under common
+ control with that entity. For the purposes of this definition,
+ "control" means (i) the power, direct or indirect, to cause the
+ direction or management of such entity, whether by contract or
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
+ outstanding shares, or (iii) beneficial ownership of such entity.
+
+ "You" (or "Your") shall mean an individual or Legal Entity
+ exercising permissions granted by this License.
+
+ "Source" form shall mean the preferred form for making modifications,
+ including but not limited to software source code, documentation
+ source, and configuration files.
+
+ "Object" form shall mean any form resulting from mechanical
+ transformation or translation of a Source form, including but
+ not limited to compiled object code, generated documentation,
+ and conversions to other media types.
+
+ "Work" shall mean the work of authorship, whether in Source or
+ Object form, made available under the License, as indicated by a
+ copyright notice that is included in or attached to the work
+ (an example is provided in the Appendix below).
+
+ "Derivative Works" shall mean any work, whether in Source or Object
+ form, that is based on (or derived from) the Work and for which the
+ editorial revisions, annotations, elaborations, or other modifications
+ represent, as a whole, an original work of authorship. For the purposes
+ of this License, Derivative Works shall not include works that remain
+ separable from, or merely link (or bind by name) to the interfaces of,
+ the Work and Derivative Works thereof.
+
+ "Contribution" shall mean any work of authorship, including
+ the original version of the Work and any modifications or additions
+ to that Work or Derivative Works thereof, that is intentionally
+ submitted to Licensor for inclusion in the Work by the copyright owner
+ or by an individual or Legal Entity authorized to submit on behalf of
+ the copyright owner. For the purposes of this definition, "submitted"
+ means any form of electronic, verbal, or written communication sent
+ to the Licensor or its representatives, including but not limited to
+ communication on electronic mailing lists, source code control systems,
+ and issue tracking systems that are managed by, or on behalf of, the
+ Licensor for the purpose of discussing and improving the Work, but
+ excluding communication that is conspicuously marked or otherwise
+ designated in writing by the copyright owner as "Not a Contribution."
+
+ "Contributor" shall mean Licensor and any individual or Legal Entity
+ on behalf of whom a Contribution has been received by Licensor and
+ subsequently incorporated within the Work.
+
+ 2. Grant of Copyright License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ copyright license to reproduce, prepare Derivative Works of,
+ publicly display, publicly perform, sublicense, and distribute the
+ Work and such Derivative Works in Source or Object form.
+
+ 3. Grant of Patent License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ (except as stated in this section) patent license to make, have made,
+ use, offer to sell, sell, import, and otherwise transfer the Work,
+ where such license applies only to those patent claims licensable
+ by such Contributor that are necessarily infringed by their
+ Contribution(s) alone or by combination of their Contribution(s)
+ with the Work to which such Contribution(s) was submitted. If You
+ institute patent litigation against any entity (including a
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
+ or a Contribution incorporated within the Work constitutes direct
+ or contributory patent infringement, then any patent licenses
+ granted to You under this License for that Work shall terminate
+ as of the date such litigation is filed.
+
+ 4. Redistribution. You may reproduce and distribute copies of the
+ Work or Derivative Works thereof in any medium, with or without
+ modifications, and in Source or Object form, provided that You
+ meet the following conditions:
+
+ (a) You must give any other recipients of the Work or
+ Derivative Works a copy of this License; and
+
+ (b) You must cause any modified files to carry prominent notices
+ stating that You changed the files; and
+
+ (c) You must retain, in the Source form of any Derivative Works
+ that You distribute, all copyright, patent, trademark, and
+ attribution notices from the Source form of the Work,
+ excluding those notices that do not pertain to any part of
+ the Derivative Works; and
+
+ (d) If the Work includes a "NOTICE" text file as part of its
+ distribution, then any Derivative Works that You distribute must
+ include a readable copy of the attribution notices contained
+ within such NOTICE file, excluding those notices that do not
+ pertain to any part of the Derivative Works, in at least one
+ of the following places: within a NOTICE text file distributed
+ as part of the Derivative Works; within the Source form or
+ documentation, if provided along with the Derivative Works; or,
+ within a display generated by the Derivative Works, if and
+ wherever such third-party notices normally appear. The contents
+ of the NOTICE file are for informational purposes only and
+ do not modify the License. You may add Your own attribution
+ notices within Derivative Works that You distribute, alongside
+ or as an addendum to the NOTICE text from the Work, provided
+ that such additional attribution notices cannot be construed
+ as modifying the License.
+
+ You may add Your own copyright statement to Your modifications and
+ may provide additional or different license terms and conditions
+ for use, reproduction, or distribution of Your modifications, or
+ for any such Derivative Works as a whole, provided Your use,
+ reproduction, and distribution of the Work otherwise complies with
+ the conditions stated in this License.
+
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
+ any Contribution intentionally submitted for inclusion in the Work
+ by You to the Licensor shall be under the terms and conditions of
+ this License, without any additional terms or conditions.
+ Notwithstanding the above, nothing herein shall supersede or modify
+ the terms of any separate license agreement you may have executed
+ with Licensor regarding such Contributions.
+
+ 6. Trademarks. This License does not grant permission to use the trade
+ names, trademarks, service marks, or product names of the Licensor,
+ except as required for reasonable and customary use in describing the
+ origin of the Work and reproducing the content of the NOTICE file.
+
+ 7. Disclaimer of Warranty. Unless required by applicable law or
+ agreed to in writing, Licensor provides the Work (and each
+ Contributor provides its Contributions) on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+ implied, including, without limitation, any warranties or conditions
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+ PARTICULAR PURPOSE. You are solely responsible for determining the
+ appropriateness of using or redistributing the Work and assume any
+ risks associated with Your exercise of permissions under this License.
+
+ 8. Limitation of Liability. In no event and under no legal theory,
+ whether in tort (including negligence), contract, or otherwise,
+ unless required by applicable law (such as deliberate and grossly
+ negligent acts) or agreed to in writing, shall any Contributor be
+ liable to You for damages, including any direct, indirect, special,
+ incidental, or consequential damages of any character arising as a
+ result of this License or out of the use or inability to use the
+ Work (including but not limited to damages for loss of goodwill,
+ work stoppage, computer failure or malfunction, or any and all
+ other commercial damages or losses), even if such Contributor
+ has been advised of the possibility of such damages.
+
+ 9. Accepting Warranty or Additional Liability. While redistributing
+ the Work or Derivative Works thereof, You may choose to offer,
+ and charge a fee for, acceptance of support, warranty, indemnity,
+ or other liability obligations and/or rights consistent with this
+ License. However, in accepting such obligations, You may act only
+ on Your own behalf and on Your sole responsibility, not on behalf
+ of any other Contributor, and only if You agree to indemnify,
+ defend, and hold each Contributor harmless for any liability
+ incurred by, or claims asserted against, such Contributor by reason
+ of your accepting any such warranty or additional liability.
+
+ END OF TERMS AND CONDITIONS
+
+ APPENDIX: How to apply the Apache License to your work.
+
+ To apply the Apache License to your work, attach the following
+ boilerplate notice, with the fields enclosed by brackets "[]"
+ replaced with your own identifying information. (Don't include
+ the brackets!) The text should be enclosed in the appropriate
+ comment syntax for the file format. We also recommend that a
+ file or class name and description of purpose be included on the
+ same "printed page" as the copyright notice for easier
+ identification within third-party archives.
+
+ Copyright [yyyy] [name of copyright owner]
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
Index: modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/UIMATypeAwareAnalyzerTest.java
===================================================================
--- modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/UIMATypeAwareAnalyzerTest.java (revision 0)
+++ modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/UIMATypeAwareAnalyzerTest.java (revision 0)
@@ -0,0 +1,60 @@
+package org.apache.lucene.analysis.uima;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import org.apache.lucene.analysis.BaseTokenStreamTestCase;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
+import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
+import org.apache.lucene.analysis.tokenattributes.TypeAttribute;
+import org.junit.Test;
+
+import java.io.StringReader;
+
+/**
+ * Testcase for {@link UIMATypeAwareAnalyzer}
+ */
+public class UIMATypeAwareAnalyzerTest extends BaseTokenStreamTestCase {
+
+ @Test
+ public void baseUIMATypeAwareAnalyzerStreamTest() throws Exception {
+ // create the tokenizer
+ UIMATypeAwareAnalyzer analyzer = new UIMATypeAwareAnalyzer("/uima/AggregateSentenceAE.xml",
+ "org.apache.uima.TokenAnnotation", "posTag");
+
+ // create a token stream
+ TokenStream ts = analyzer.tokenStream("text", new StringReader("the big brown fox jumped on the wood"));
+ // set attributes
+ CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class);
+ OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class);
+ TypeAttribute typeAttr = ts.addAttribute(TypeAttribute.class);
+
+ // check that 'the big brown fox jumped on the wood' tokens have the expected PoS types
+ String[] expectedPos = new String[]{"at", "jj", "jj", "nn", "vbd", "in", "at", "nn"};
+ int i = 0;
+ while (ts.incrementToken()) {
+ assertNotNull(offsetAtt);
+ assertNotNull(termAtt);
+ assertNotNull(typeAttr);
+ assertEquals(typeAttr.type(), expectedPos[i]);
+ i++;
+ }
+ }
+
+}
Index: modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ae/BasicAEProviderTest.java
===================================================================
--- modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ae/BasicAEProviderTest.java (revision 0)
+++ modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ae/BasicAEProviderTest.java (revision 0)
@@ -0,0 +1,36 @@
+package org.apache.lucene.analysis.uima.ae;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import org.apache.uima.analysis_engine.AnalysisEngine;
+import org.junit.Test;
+
+import static org.junit.Assert.assertNotNull;
+
+/**
+ * TestCase for {@link BasicAEProvider}
+ */
+public class BasicAEProviderTest {
+
+ @Test
+ public void testBasicInititalization() throws Exception {
+ AEProvider basicAEProvider = new BasicAEProvider("/uima/DummyEntityAEDescriptor.xml");
+ AnalysisEngine analysisEngine = basicAEProvider.getAE();
+ assertNotNull(analysisEngine);
+ }
+}
Index: modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ae/OverridingParamsAEProviderTest.java
===================================================================
--- modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ae/OverridingParamsAEProviderTest.java (revision 0)
+++ modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ae/OverridingParamsAEProviderTest.java (revision 0)
@@ -0,0 +1,61 @@
+package org.apache.lucene.analysis.uima.ae;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import org.apache.uima.analysis_engine.AnalysisEngine;
+import org.apache.uima.resource.ResourceInitializationException;
+import org.junit.Test;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import static org.junit.Assert.*;
+
+/**
+ * TestCase for {@link OverridingParamsAEProvider}
+ */
+public class OverridingParamsAEProviderTest {
+
+ @Test
+ public void testNullMapInitialization() throws Exception {
+ try {
+ AEProvider aeProvider = new OverridingParamsAEProvider("/uima/DummyEntityAEDescriptor.xml", null);
+ aeProvider.getAE();
+ fail("should fail due to null Map passed");
+ } catch (ResourceInitializationException e) {
+ // everything ok
+ }
+ }
+
+ @Test
+ public void testEmptyMapInitialization() throws Exception {
+ AEProvider aeProvider = new OverridingParamsAEProvider("/uima/DummyEntityAEDescriptor.xml", new HashMap());
+ AnalysisEngine analysisEngine = aeProvider.getAE();
+ assertNotNull(analysisEngine);
+ }
+
+ @Test
+ public void testOverridingParamsInitialization() throws Exception {
+ Map runtimeParameters = new HashMap();
+ runtimeParameters.put("ngramsize", "3");
+ AEProvider aeProvider = new OverridingParamsAEProvider("/uima/AggregateSentenceAE.xml", runtimeParameters);
+ AnalysisEngine analysisEngine = aeProvider.getAE();
+ assertNotNull(analysisEngine);
+ assertEquals(analysisEngine.getConfigParameterValue("ngramsize"), 3);
+ }
+}
Index: modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/SentimentAnnotation.java
===================================================================
--- modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/SentimentAnnotation.java (revision 0)
+++ modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/SentimentAnnotation.java (revision 0)
@@ -0,0 +1,79 @@
+
+
+/* First created by JCasGen Fri Mar 04 13:08:40 CET 2011 */
+package org.apache.lucene.analysis.uima.ts;
+
+import org.apache.uima.jcas.JCas;
+import org.apache.uima.jcas.JCasRegistry;
+import org.apache.uima.jcas.cas.TOP_Type;
+import org.apache.uima.jcas.tcas.Annotation;
+
+
+/**
+ * Updated by JCasGen Fri Mar 04 13:08:40 CET 2011
+ * XML source: /Users/tommasoteofili/Documents/workspaces/lucene_workspace/lucene_dev/solr/contrib/uima/src/test/resources/DummySentimentAnalysisAEDescriptor.xml
+ * @generated */
+public class SentimentAnnotation extends Annotation {
+ /** @generated
+ * @ordered
+ */
+ public final static int typeIndexID = JCasRegistry.register(SentimentAnnotation.class);
+ /** @generated
+ * @ordered
+ */
+ public final static int type = typeIndexID;
+ /** @generated */
+ public int getTypeIndexID() {return typeIndexID;}
+
+ /** Never called. Disable default constructor
+ * @generated */
+ protected SentimentAnnotation() {}
+
+ /** Internal - constructor used by generator
+ * @generated */
+ public SentimentAnnotation(int addr, TOP_Type type) {
+ super(addr, type);
+ readObject();
+ }
+
+ /** @generated */
+ public SentimentAnnotation(JCas jcas) {
+ super(jcas);
+ readObject();
+ }
+
+ /** @generated */
+ public SentimentAnnotation(JCas jcas, int begin, int end) {
+ super(jcas);
+ setBegin(begin);
+ setEnd(end);
+ readObject();
+ }
+
+ /**
+ * Write your own initialization here
+ *
+ @generated modifiable */
+ private void readObject() {}
+
+
+
+ //*--------------*
+ //* Feature: mood
+
+ /** getter for mood - gets
+ * @generated */
+ public String getMood() {
+ if (SentimentAnnotation_Type.featOkTst && ((SentimentAnnotation_Type)jcasType).casFeat_mood == null)
+ jcasType.jcas.throwFeatMissing("mood", "org.apache.solr.uima.ts.SentimentAnnotation");
+ return jcasType.ll_cas.ll_getStringValue(addr, ((SentimentAnnotation_Type)jcasType).casFeatCode_mood);}
+
+ /** setter for mood - sets
+ * @generated */
+ public void setMood(String v) {
+ if (SentimentAnnotation_Type.featOkTst && ((SentimentAnnotation_Type)jcasType).casFeat_mood == null)
+ jcasType.jcas.throwFeatMissing("mood", "org.apache.solr.uima.ts.SentimentAnnotation");
+ jcasType.ll_cas.ll_setStringValue(addr, ((SentimentAnnotation_Type)jcasType).casFeatCode_mood, v);}
+ }
+
+
\ No newline at end of file
Index: modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/SentimentAnnotation_Type.java
===================================================================
--- modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/SentimentAnnotation_Type.java (revision 0)
+++ modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/SentimentAnnotation_Type.java (revision 0)
@@ -0,0 +1,79 @@
+
+/* First created by JCasGen Fri Mar 04 13:08:40 CET 2011 */
+package org.apache.lucene.analysis.uima.ts;
+
+import org.apache.uima.cas.Feature;
+import org.apache.uima.cas.FeatureStructure;
+import org.apache.uima.cas.Type;
+import org.apache.uima.cas.impl.CASImpl;
+import org.apache.uima.cas.impl.FSGenerator;
+import org.apache.uima.cas.impl.FeatureImpl;
+import org.apache.uima.cas.impl.TypeImpl;
+import org.apache.uima.jcas.JCas;
+import org.apache.uima.jcas.JCasRegistry;
+import org.apache.uima.jcas.tcas.Annotation_Type;
+
+/**
+ * Updated by JCasGen Fri Mar 04 13:08:40 CET 2011
+ * @generated */
+public class SentimentAnnotation_Type extends Annotation_Type {
+ /** @generated */
+ protected FSGenerator getFSGenerator() {return fsGenerator;}
+ /** @generated */
+ private final FSGenerator fsGenerator =
+ new FSGenerator() {
+ public FeatureStructure createFS(int addr, CASImpl cas) {
+ if (SentimentAnnotation_Type.this.useExistingInstance) {
+ // Return eq fs instance if already created
+ FeatureStructure fs = SentimentAnnotation_Type.this.jcas.getJfsFromCaddr(addr);
+ if (null == fs) {
+ fs = new SentimentAnnotation(addr, SentimentAnnotation_Type.this);
+ SentimentAnnotation_Type.this.jcas.putJfsFromCaddr(addr, fs);
+ return fs;
+ }
+ return fs;
+ } else return new SentimentAnnotation(addr, SentimentAnnotation_Type.this);
+ }
+ };
+ /** @generated */
+ public final static int typeIndexID = SentimentAnnotation.typeIndexID;
+ /** @generated
+ @modifiable */
+ public final static boolean featOkTst = JCasRegistry.getFeatOkTst("org.apache.solr.uima.ts.SentimentAnnotation");
+
+ /** @generated */
+ final Feature casFeat_mood;
+ /** @generated */
+ final int casFeatCode_mood;
+ /** @generated */
+ public String getMood(int addr) {
+ if (featOkTst && casFeat_mood == null)
+ jcas.throwFeatMissing("mood", "org.apache.solr.uima.ts.SentimentAnnotation");
+ return ll_cas.ll_getStringValue(addr, casFeatCode_mood);
+ }
+ /** @generated */
+ public void setMood(int addr, String v) {
+ if (featOkTst && casFeat_mood == null)
+ jcas.throwFeatMissing("mood", "org.apache.solr.uima.ts.SentimentAnnotation");
+ ll_cas.ll_setStringValue(addr, casFeatCode_mood, v);}
+
+
+
+
+
+ /** initialize variables to correspond with Cas Type and Features
+ * @generated */
+ public SentimentAnnotation_Type(JCas jcas, Type casType) {
+ super(jcas, casType);
+ casImpl.getFSClassRegistry().addGeneratorForType((TypeImpl)this.casType, getFSGenerator());
+
+
+ casFeat_mood = jcas.getRequiredFeatureDE(casType, "mood", "uima.cas.String", featOkTst);
+ casFeatCode_mood = (null == casFeat_mood) ? JCas.INVALID_FEATURE_CODE : ((FeatureImpl)casFeat_mood).getCode();
+
+ }
+}
+
+
+
+
\ No newline at end of file
Index: modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/EntityAnnotation.java
===================================================================
--- modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/EntityAnnotation.java (revision 0)
+++ modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/EntityAnnotation.java (revision 0)
@@ -0,0 +1,97 @@
+
+
+/* First created by JCasGen Sat May 07 22:33:38 JST 2011 */
+package org.apache.lucene.analysis.uima.ts;
+
+import org.apache.uima.jcas.JCas;
+import org.apache.uima.jcas.JCasRegistry;
+import org.apache.uima.jcas.cas.TOP_Type;
+import org.apache.uima.jcas.tcas.Annotation;
+
+
+/**
+ * Updated by JCasGen Sat May 07 22:33:38 JST 2011
+ * XML source: /Users/koji/Documents/workspace/DummyEntityAnnotator/desc/DummyEntityAEDescriptor.xml
+ * @generated */
+public class EntityAnnotation extends Annotation {
+ /** @generated
+ * @ordered
+ */
+ public final static int typeIndexID = JCasRegistry.register(EntityAnnotation.class);
+ /** @generated
+ * @ordered
+ */
+ public final static int type = typeIndexID;
+ /** @generated */
+ public int getTypeIndexID() {return typeIndexID;}
+
+ /** Never called. Disable default constructor
+ * @generated */
+ protected EntityAnnotation() {}
+
+ /** Internal - constructor used by generator
+ * @generated */
+ public EntityAnnotation(int addr, TOP_Type type) {
+ super(addr, type);
+ readObject();
+ }
+
+ /** @generated */
+ public EntityAnnotation(JCas jcas) {
+ super(jcas);
+ readObject();
+ }
+
+ /** @generated */
+ public EntityAnnotation(JCas jcas, int begin, int end) {
+ super(jcas);
+ setBegin(begin);
+ setEnd(end);
+ readObject();
+ }
+
+ /**
+ * Write your own initialization here
+ *
+ @generated modifiable */
+ private void readObject() {}
+
+
+
+ //*--------------*
+ //* Feature: name
+
+ /** getter for name - gets
+ * @generated */
+ public String getName() {
+ if (EntityAnnotation_Type.featOkTst && ((EntityAnnotation_Type)jcasType).casFeat_name == null)
+ jcasType.jcas.throwFeatMissing("name", "org.apache.solr.uima.ts.EntityAnnotation");
+ return jcasType.ll_cas.ll_getStringValue(addr, ((EntityAnnotation_Type)jcasType).casFeatCode_name);}
+
+ /** setter for name - sets
+ * @generated */
+ public void setName(String v) {
+ if (EntityAnnotation_Type.featOkTst && ((EntityAnnotation_Type)jcasType).casFeat_name == null)
+ jcasType.jcas.throwFeatMissing("name", "org.apache.solr.uima.ts.EntityAnnotation");
+ jcasType.ll_cas.ll_setStringValue(addr, ((EntityAnnotation_Type)jcasType).casFeatCode_name, v);}
+
+
+ //*--------------*
+ //* Feature: entity
+
+ /** getter for entity - gets
+ * @generated */
+ public String getEntity() {
+ if (EntityAnnotation_Type.featOkTst && ((EntityAnnotation_Type)jcasType).casFeat_entity == null)
+ jcasType.jcas.throwFeatMissing("entity", "org.apache.solr.uima.ts.EntityAnnotation");
+ return jcasType.ll_cas.ll_getStringValue(addr, ((EntityAnnotation_Type)jcasType).casFeatCode_entity);}
+
+ /** setter for entity - sets
+ * @generated */
+ public void setEntity(String v) {
+ if (EntityAnnotation_Type.featOkTst && ((EntityAnnotation_Type)jcasType).casFeat_entity == null)
+ jcasType.jcas.throwFeatMissing("entity", "org.apache.solr.uima.ts.EntityAnnotation");
+ jcasType.ll_cas.ll_setStringValue(addr, ((EntityAnnotation_Type)jcasType).casFeatCode_entity, v);}
+ }
+
+
\ No newline at end of file
Index: modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/EntityAnnotation_Type.java
===================================================================
--- modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/EntityAnnotation_Type.java (revision 0)
+++ modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/ts/EntityAnnotation_Type.java (revision 0)
@@ -0,0 +1,118 @@
+
+/* First created by JCasGen Sat May 07 22:33:38 JST 2011 */
+package org.apache.lucene.analysis.uima.ts;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import org.apache.uima.cas.Feature;
+import org.apache.uima.cas.FeatureStructure;
+import org.apache.uima.cas.Type;
+import org.apache.uima.cas.impl.CASImpl;
+import org.apache.uima.cas.impl.FSGenerator;
+import org.apache.uima.cas.impl.FeatureImpl;
+import org.apache.uima.cas.impl.TypeImpl;
+import org.apache.uima.jcas.JCas;
+import org.apache.uima.jcas.JCasRegistry;
+import org.apache.uima.jcas.tcas.Annotation_Type;
+
+/**
+ * Updated by JCasGen Sat May 07 22:33:38 JST 2011
+ * @generated */
+public class EntityAnnotation_Type extends Annotation_Type {
+ /** @generated */
+ protected FSGenerator getFSGenerator() {return fsGenerator;}
+ /** @generated */
+ private final FSGenerator fsGenerator =
+ new FSGenerator() {
+ public FeatureStructure createFS(int addr, CASImpl cas) {
+ if (EntityAnnotation_Type.this.useExistingInstance) {
+ // Return eq fs instance if already created
+ FeatureStructure fs = EntityAnnotation_Type.this.jcas.getJfsFromCaddr(addr);
+ if (null == fs) {
+ fs = new EntityAnnotation(addr, EntityAnnotation_Type.this);
+ EntityAnnotation_Type.this.jcas.putJfsFromCaddr(addr, fs);
+ return fs;
+ }
+ return fs;
+ } else return new EntityAnnotation(addr, EntityAnnotation_Type.this);
+ }
+ };
+ /** @generated */
+ public final static int typeIndexID = EntityAnnotation.typeIndexID;
+ /** @generated
+ @modifiable */
+ public final static boolean featOkTst = JCasRegistry.getFeatOkTst("org.apache.solr.uima.ts.EntityAnnotation");
+
+ /** @generated */
+ final Feature casFeat_name;
+ /** @generated */
+ final int casFeatCode_name;
+ /** @generated */
+ public String getName(int addr) {
+ if (featOkTst && casFeat_name == null)
+ jcas.throwFeatMissing("name", "org.apache.solr.uima.ts.EntityAnnotation");
+ return ll_cas.ll_getStringValue(addr, casFeatCode_name);
+ }
+ /** @generated */
+ public void setName(int addr, String v) {
+ if (featOkTst && casFeat_name == null)
+ jcas.throwFeatMissing("name", "org.apache.solr.uima.ts.EntityAnnotation");
+ ll_cas.ll_setStringValue(addr, casFeatCode_name, v);}
+
+
+
+ /** @generated */
+ final Feature casFeat_entity;
+ /** @generated */
+ final int casFeatCode_entity;
+ /** @generated */
+ public String getEntity(int addr) {
+ if (featOkTst && casFeat_entity == null)
+ jcas.throwFeatMissing("entity", "org.apache.solr.uima.ts.EntityAnnotation");
+ return ll_cas.ll_getStringValue(addr, casFeatCode_entity);
+ }
+ /** @generated */
+ public void setEntity(int addr, String v) {
+ if (featOkTst && casFeat_entity == null)
+ jcas.throwFeatMissing("entity", "org.apache.solr.uima.ts.EntityAnnotation");
+ ll_cas.ll_setStringValue(addr, casFeatCode_entity, v);}
+
+
+
+
+
+ /** initialize variables to correspond with Cas Type and Features
+ * @generated */
+ public EntityAnnotation_Type(JCas jcas, Type casType) {
+ super(jcas, casType);
+ casImpl.getFSClassRegistry().addGeneratorForType((TypeImpl)this.casType, getFSGenerator());
+
+
+ casFeat_name = jcas.getRequiredFeatureDE(casType, "name", "uima.cas.String", featOkTst);
+ casFeatCode_name = (null == casFeat_name) ? JCas.INVALID_FEATURE_CODE : ((FeatureImpl)casFeat_name).getCode();
+
+
+ casFeat_entity = jcas.getRequiredFeatureDE(casType, "entity", "uima.cas.String", featOkTst);
+ casFeatCode_entity = (null == casFeat_entity) ? JCas.INVALID_FEATURE_CODE : ((FeatureImpl)casFeat_entity).getCode();
+
+ }
+}
+
+
+
+
\ No newline at end of file
Index: modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/UIMABaseAnalyzerTest.java
===================================================================
--- modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/UIMABaseAnalyzerTest.java (revision 0)
+++ modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/UIMABaseAnalyzerTest.java (revision 0)
@@ -0,0 +1,121 @@
+package org.apache.lucene.analysis.uima;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import org.apache.lucene.analysis.BaseTokenStreamTestCase;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.Field;
+import org.apache.lucene.document.TextField;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexWriter;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.MatchAllDocsQuery;
+import org.apache.lucene.search.ScoreDoc;
+import org.apache.lucene.search.TopDocs;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.RAMDirectory;
+import org.apache.lucene.util.Version;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.StringReader;
+
+/**
+ * Testcase for {@link UIMABaseAnalyzer}
+ */
+public class UIMABaseAnalyzerTest extends BaseTokenStreamTestCase {
+
+ private Directory dir;
+
+ private UIMABaseAnalyzer analyzer;
+
+ private IndexWriter writer;
+
+ @Before
+ public void setUp() throws Exception {
+ super.setUp();
+ dir = new RAMDirectory();
+ analyzer = new UIMABaseAnalyzer("/uima/AggregateSentenceAE.xml", "org.apache.uima.TokenAnnotation");
+ writer = new IndexWriter(dir, new IndexWriterConfig(Version.LUCENE_40, analyzer));
+ }
+
+ @After
+ public void tearDown() throws Exception {
+ writer.close();
+ analyzer.close();
+ dir.close();
+ super.tearDown();
+ }
+
+ @Test
+ public void baseUIMAAnalyzerStreamTest() throws Exception {
+ TokenStream ts = analyzer.tokenStream("text", new StringReader("the big brown fox jumped on the wood"));
+ CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class);
+ OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class);
+ int i = 0;
+ while (ts.incrementToken()) {
+ assertNotNull(offsetAtt);
+ assertNotNull(termAtt);
+ i++;
+ }
+ assertEquals(i, 8);
+ }
+
+ @Test
+ public void baseUIMAAnalyzerIntegrationTest() throws Exception {
+ // add the first doc
+ Document doc = new Document();
+ doc.add(new Field("title", "this is a dummy title ", TextField.TYPE_STORED));
+ doc.add(new Field("contents", "there is some content written here", TextField.TYPE_STORED));
+ writer.addDocument(doc, analyzer);
+ writer.commit();
+
+ // try the search over the first doc
+ IndexSearcher indexSearcher = new IndexSearcher(DirectoryReader.open(dir));
+ TopDocs result = indexSearcher.search(new MatchAllDocsQuery(), 10);
+ assertTrue(result.totalHits > 0);
+ Document d = indexSearcher.doc(result.scoreDocs[0].doc);
+ assertNotNull(d);
+ assertNotNull(d.getField("title"));
+ assertNotNull(d.getField("contents"));
+
+ // add a second doc
+ doc = new Document();
+ doc.add(new Field("title", "dogmas", TextField.TYPE_STORED));
+ doc.add(new Field("contents", "white men can't jump", TextField.TYPE_STORED));
+ writer.addDocument(doc, analyzer);
+ writer.commit();
+
+ // do a matchalldocs query to retrieve both docs
+ indexSearcher = new IndexSearcher(DirectoryReader.open(dir));
+ result = indexSearcher.search(new MatchAllDocsQuery(), 10);
+ assertTrue(result.totalHits > 0);
+ for (ScoreDoc di : result.scoreDocs) {
+ d = indexSearcher.doc(di.doc);
+ assertNotNull(d);
+ assertNotNull(d.getField("title"));
+ assertNotNull(d.getField("contents"));
+ }
+ }
+
+}
Index: modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/an/DummyEntityAnnotator.java
===================================================================
--- modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/an/DummyEntityAnnotator.java (revision 0)
+++ modules/analysis/uima/src/test/org/apache/lucene/analysis/uima/an/DummyEntityAnnotator.java (working copy)
@@ -1,13 +1,6 @@
-package org.apache.solr.uima.processor.an;
+package org.apache.lucene.analysis.uima.an;
-import org.apache.solr.uima.ts.EntityAnnotation;
-import org.apache.uima.TokenAnnotation;
-import org.apache.uima.analysis_component.JCasAnnotator_ImplBase;
-import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
-import org.apache.uima.jcas.JCas;
-import org.apache.uima.jcas.tcas.Annotation;
-
-/**
+/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
@@ -24,6 +17,13 @@
* limitations under the License.
*/
+import org.apache.lucene.analysis.uima.ts.EntityAnnotation;
+import org.apache.uima.TokenAnnotation;
+import org.apache.uima.analysis_component.JCasAnnotator_ImplBase;
+import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
+import org.apache.uima.jcas.JCas;
+import org.apache.uima.jcas.tcas.Annotation;
+
public class DummyEntityAnnotator extends JCasAnnotator_ImplBase{
@Override
Index: modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/BaseUIMATokenizer.java
===================================================================
--- modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/BaseUIMATokenizer.java (revision 0)
+++ modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/BaseUIMATokenizer.java (revision 0)
@@ -0,0 +1,71 @@
+package org.apache.lucene.analysis.uima;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import org.apache.lucene.analysis.Tokenizer;
+import org.apache.uima.analysis_engine.AnalysisEngine;
+import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
+import org.apache.uima.cas.CAS;
+import org.apache.uima.resource.ResourceInitializationException;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.Reader;
+
+/**
+ * Abstract base implementation of a {@link Tokenizer} which is able to analyze the given input with a
+ * UIMA {@link AnalysisEngine}
+ */
+public abstract class BaseUIMATokenizer extends Tokenizer {
+
+ protected BaseUIMATokenizer(Reader reader) {
+ super(reader);
+ }
+
+ /**
+ * analyzes the tokenizer input using the given analysis engine
+ *
+ * @param analysisEngine the AE to use for analyzing the tokenizer input
+ * @return CAS with extracted metadata (UIMA annotations, feature structures)
+ * @throws ResourceInitializationException
+ *
+ * @throws AnalysisEngineProcessException
+ * @throws IOException
+ */
+ protected CAS analyzeInput(AnalysisEngine analysisEngine) throws ResourceInitializationException,
+ AnalysisEngineProcessException, IOException {
+ CAS cas = analysisEngine.newCAS();
+ cas.setDocumentText(toString(input));
+ analysisEngine.process(cas);
+ analysisEngine.destroy();
+ return cas;
+ }
+
+ private String toString(Reader reader) throws IOException {
+ BufferedReader bufferedReader = new BufferedReader(reader);
+ StringBuilder stringBuilder = new StringBuilder();
+ String ls = System.getProperty("line.separator");
+ String line;
+ while ((line = bufferedReader.readLine()) != null) {
+ stringBuilder.append(line);
+ stringBuilder.append(ls);
+ }
+ return stringBuilder.toString();
+ }
+
+}
Index: modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMATypeAwareAnalyzer.java
===================================================================
--- modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMATypeAwareAnalyzer.java (revision 0)
+++ modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMATypeAwareAnalyzer.java (revision 0)
@@ -0,0 +1,42 @@
+package org.apache.lucene.analysis.uima;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import org.apache.lucene.analysis.Analyzer;
+
+import java.io.Reader;
+
+/**
+ * {@link Analyzer} which uses the {@link UIMATypeAwareAnnotationsTokenizer} for the tokenization phase
+ */
+public final class UIMATypeAwareAnalyzer extends Analyzer {
+ private String descriptorPath;
+ private String tokenType;
+ private String featurePath;
+
+ public UIMATypeAwareAnalyzer(String descriptorPath, String tokenType, String featurePath) {
+ this.descriptorPath = descriptorPath;
+ this.tokenType = tokenType;
+ this.featurePath = featurePath;
+ }
+
+ @Override
+ protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
+ return new TokenStreamComponents(new UIMATypeAwareAnnotationsTokenizer(descriptorPath, tokenType, featurePath, reader));
+ }
+}
Index: modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMAAnnotationsTokenizer.java
===================================================================
--- modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMAAnnotationsTokenizer.java (revision 0)
+++ modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMAAnnotationsTokenizer.java (revision 0)
@@ -0,0 +1,89 @@
+package org.apache.lucene.analysis.uima;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import org.apache.lucene.analysis.Tokenizer;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
+import org.apache.lucene.analysis.uima.ae.AEProviderFactory;
+import org.apache.uima.analysis_engine.AnalysisEngine;
+import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
+import org.apache.uima.cas.CAS;
+import org.apache.uima.cas.FSIterator;
+import org.apache.uima.cas.Type;
+import org.apache.uima.cas.text.AnnotationFS;
+import org.apache.uima.resource.ResourceInitializationException;
+import org.apache.uima.util.InvalidXMLException;
+
+import java.io.IOException;
+import java.io.Reader;
+
+/**
+ * a {@link Tokenizer} which creates tokens from UIMA Annotations
+ */
+public final class UIMAAnnotationsTokenizer extends BaseUIMATokenizer {
+
+ private CharTermAttribute termAttr;
+
+ private OffsetAttribute offsetAttr;
+
+ private FSIterator iterator;
+
+ private String tokenTypeString;
+
+ private String descriptorPath;
+
+ public UIMAAnnotationsTokenizer(String descriptorPath, String tokenType, Reader input) {
+ super(input);
+ this.tokenTypeString = tokenType;
+ this.termAttr = addAttribute(CharTermAttribute.class);
+ this.offsetAttr = addAttribute(OffsetAttribute.class);
+ this.descriptorPath = descriptorPath;
+ }
+
+ private void analyzeText(String descriptorPath) throws InvalidXMLException,
+ IOException, ResourceInitializationException, AnalysisEngineProcessException {
+ AnalysisEngine ae = AEProviderFactory.getInstance().getAEProvider("", descriptorPath).getAE();
+ CAS cas = analyzeInput(ae);
+ Type tokenType = cas.getTypeSystem().getType(tokenTypeString);
+ iterator = cas.getAnnotationIndex(tokenType).iterator();
+ }
+
+ @Override
+ public boolean incrementToken() throws IOException {
+ if (iterator == null) {
+ try {
+ analyzeText(descriptorPath);
+ } catch (Exception e) {
+ throw new IOException(e);
+ }
+ }
+ if (iterator.hasNext()) {
+ clearAttributes();
+ AnnotationFS next = iterator.next();
+ termAttr.setEmpty();
+ termAttr.append(next.getCoveredText());
+ termAttr.setLength(next.getCoveredText().length());
+ offsetAttr.setOffset(correctOffset(next.getBegin()), correctOffset(next.getEnd()));
+ return true;
+ } else {
+ return false;
+ }
+ }
+
+}
Index: modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/AEProvider.java
===================================================================
--- modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/AEProvider.java (revision 0)
+++ modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/AEProvider.java (revision 0)
@@ -0,0 +1,36 @@
+package org.apache.lucene.analysis.uima.ae;
+
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import org.apache.uima.analysis_engine.AnalysisEngine;
+import org.apache.uima.resource.ResourceInitializationException;
+
+/**
+ * provide an Apache UIMA {@link AnalysisEngine}
+ *
+ */
+public interface AEProvider {
+
+ /**
+ *
+ * @return
+ * @throws ResourceInitializationException
+ */
+ public AnalysisEngine getAE() throws ResourceInitializationException;
+
+}
Index: modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/BasicAEProvider.java
===================================================================
--- modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/BasicAEProvider.java (revision 0)
+++ modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/BasicAEProvider.java (revision 0)
@@ -0,0 +1,65 @@
+package org.apache.lucene.analysis.uima.ae;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import org.apache.uima.UIMAFramework;
+import org.apache.uima.analysis_engine.AnalysisEngine;
+import org.apache.uima.analysis_engine.AnalysisEngineDescription;
+import org.apache.uima.resource.ResourceInitializationException;
+import org.apache.uima.util.XMLInputSource;
+
+import java.lang.annotation.Inherited;
+import java.net.URL;
+
+/**
+ * Basic {@link AEProvider} which just instantiates a UIMA {@link AnalysisEngine} with no additional metadata,
+ * parameters or resources
+ */
+public class BasicAEProvider implements AEProvider {
+
+ private String aePath;
+ private AnalysisEngine cachedAE;
+
+ public BasicAEProvider(String aePath) {
+ this.aePath = aePath;
+ }
+
+ @Override
+ public synchronized AnalysisEngine getAE() throws ResourceInitializationException {
+ try {
+ if (cachedAE == null) {
+ // get Resource Specifier from XML file
+ URL url = getClass().getResource(aePath);
+ XMLInputSource in = new XMLInputSource(url);
+
+ // get AE description
+ AnalysisEngineDescription desc = UIMAFramework.getXMLParser()
+ .parseAnalysisEngineDescription(in);
+
+ // create AE here
+ cachedAE = UIMAFramework.produceAnalysisEngine(desc);
+ } else {
+ cachedAE.reconfigure();
+ }
+ } catch (Exception e) {
+ cachedAE = null;
+ throw new ResourceInitializationException(e);
+ }
+ return cachedAE;
+ }
+}
Index: modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/AEProviderFactory.java
===================================================================
--- modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/AEProviderFactory.java (revision 0)
+++ modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/AEProviderFactory.java (revision 0)
@@ -0,0 +1,73 @@
+package org.apache.lucene.analysis.uima.ae;
+
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Singleton factory class responsible of {@link AEProvider}s' creation
+ *
+ */
+public class AEProviderFactory {
+
+ private static AEProviderFactory instance;
+
+ private Map providerCache = new HashMap();
+
+ private AEProviderFactory() {
+ // Singleton
+ }
+
+ public static AEProviderFactory getInstance() {
+ if (instance == null) {
+ instance = new AEProviderFactory();
+ }
+ return instance;
+ }
+
+ /**
+ *
+ * @param keyPrefix
+ * @param aePath
+ * @return
+ */
+ public synchronized AEProvider getAEProvider(String keyPrefix, String aePath) {
+ String key = new StringBuilder(keyPrefix).append(aePath).append(BasicAEProvider.class).toString();
+ if (providerCache.get(key) == null) {
+ providerCache.put(key, new BasicAEProvider(aePath));
+ }
+ return providerCache.get(key);
+ }
+
+ /**
+ *
+ * @param keyPrefix
+ * @param aePath
+ * @param runtimeParameters
+ * @return
+ */
+ public synchronized AEProvider getAEProvider(String keyPrefix, String aePath,
+ Map runtimeParameters) {
+ String key = new StringBuilder(keyPrefix).append(aePath).append(OverridingParamsAEProvider.class).toString();
+ if (providerCache.get(key) == null) {
+ providerCache.put(key, new OverridingParamsAEProvider(aePath, runtimeParameters));
+ }
+ return providerCache.get(key);
+ }
+}
Index: modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/OverridingParamsAEProvider.java
===================================================================
--- modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/OverridingParamsAEProvider.java (revision 0)
+++ modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/ae/OverridingParamsAEProvider.java (revision 0)
@@ -0,0 +1,104 @@
+package org.apache.lucene.analysis.uima.ae;
+
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import org.apache.uima.UIMAFramework;
+import org.apache.uima.analysis_engine.AnalysisEngine;
+import org.apache.uima.analysis_engine.AnalysisEngineDescription;
+import org.apache.uima.resource.ResourceInitializationException;
+import org.apache.uima.util.XMLInputSource;
+
+import java.net.URL;
+import java.util.Map;
+
+/**
+ * {@link AEProvider} implementation that creates an Aggregate AE from the given path, also
+ * injecting runtime parameters defined in the solrconfig.xml Solr configuration file and assigning
+ * them as overriding parameters in the aggregate AE
+ *
+ */
+public class OverridingParamsAEProvider implements AEProvider {
+
+ private String aeFilePath;
+
+ private AnalysisEngine cachedAE;
+
+ private Map runtimeParameters;
+
+ public OverridingParamsAEProvider(String aeFilePath, Map runtimeParameters) {
+ this.aeFilePath = aeFilePath;
+ this.runtimeParameters = runtimeParameters;
+ }
+
+ @Override
+ public synchronized AnalysisEngine getAE() throws ResourceInitializationException {
+ try {
+ if (cachedAE == null) {
+ // get Resource Specifier from XML file
+ URL url = this.getClass().getResource(aeFilePath);
+ XMLInputSource in = new XMLInputSource(url);
+
+ // get AE description
+ AnalysisEngineDescription desc = UIMAFramework.getXMLParser()
+ .parseAnalysisEngineDescription(in);
+
+ /* iterate over each AE (to set runtime parameters) */
+ for (String attributeName : runtimeParameters.keySet()) {
+ Object val = getRuntimeValue(desc, attributeName);
+ desc.getAnalysisEngineMetaData().getConfigurationParameterSettings().setParameterValue(
+ attributeName, val);
+ }
+ // create AE here
+ cachedAE = UIMAFramework.produceAnalysisEngine(desc);
+ } else {
+ cachedAE.reconfigure();
+ }
+ } catch (Exception e) {
+ cachedAE = null;
+ throw new ResourceInitializationException(e);
+ }
+ return cachedAE;
+ }
+
+ /* create the value to inject in the runtime parameter depending on its declared type */
+ private Object getRuntimeValue(AnalysisEngineDescription desc, String attributeName)
+ throws ClassNotFoundException {
+ String type = desc.getAnalysisEngineMetaData().getConfigurationParameterDeclarations().
+ getConfigurationParameter(null, attributeName).getType();
+ // TODO : do it via reflection ? i.e. Class paramType = Class.forName(type)...
+ Object val = null;
+ Object runtimeValue = runtimeParameters.get(attributeName);
+ if (runtimeValue!=null) {
+ if ("String".equals(type)) {
+ val = String.valueOf(runtimeValue);
+ }
+ else if ("Integer".equals(type)) {
+ val = Integer.valueOf(runtimeValue.toString());
+ }
+ else if ("Boolean".equals(type)) {
+ val = Boolean.valueOf(runtimeValue.toString());
+ }
+ else if ("Float".equals(type)) {
+ val = Float.valueOf(runtimeValue.toString());
+ }
+ }
+
+ return val;
+ }
+
+}
\ No newline at end of file
Index: modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMATypeAwareAnnotationsTokenizer.java
===================================================================
--- modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMATypeAwareAnnotationsTokenizer.java (revision 0)
+++ modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMATypeAwareAnnotationsTokenizer.java (revision 0)
@@ -0,0 +1,102 @@
+package org.apache.lucene.analysis.uima;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import org.apache.lucene.analysis.Tokenizer;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
+import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
+import org.apache.lucene.analysis.tokenattributes.TypeAttribute;
+import org.apache.lucene.analysis.uima.ae.AEProviderFactory;
+import org.apache.uima.analysis_engine.AnalysisEngine;
+import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
+import org.apache.uima.cas.*;
+import org.apache.uima.cas.text.AnnotationFS;
+import org.apache.uima.resource.ResourceInitializationException;
+import org.apache.uima.util.InvalidXMLException;
+
+import java.io.IOException;
+import java.io.Reader;
+
+/**
+ * A {@link Tokenizer} which creates token from UIMA Annotations filling also their {@link TypeAttribute} according to
+ * {@link org.apache.uima.cas.FeaturePath}s specified
+ */
+public final class UIMATypeAwareAnnotationsTokenizer extends BaseUIMATokenizer {
+
+ private TypeAttribute typeAttr;
+
+ private CharTermAttribute termAttr;
+
+ private OffsetAttribute offsetAttr;
+
+ private FSIterator iterator;
+
+ private String tokenTypeString;
+
+ private String descriptorPath;
+
+ private String typeAttributeFeaturePath;
+
+ private FeaturePath featurePath;
+
+ public UIMATypeAwareAnnotationsTokenizer(String descriptorPath, String tokenType, String typeAttributeFeaturePath, Reader input) {
+ super(input);
+ this.tokenTypeString = tokenType;
+ this.termAttr = addAttribute(CharTermAttribute.class);
+ this.typeAttr = addAttribute(TypeAttribute.class);
+ this.offsetAttr = addAttribute(OffsetAttribute.class);
+ this.typeAttributeFeaturePath = typeAttributeFeaturePath;
+ this.descriptorPath = descriptorPath;
+ }
+
+ private void analyzeText() throws InvalidXMLException,
+ IOException, ResourceInitializationException, AnalysisEngineProcessException, CASException {
+ AnalysisEngine ae = AEProviderFactory.getInstance().getAEProvider("", descriptorPath).getAE();
+ CAS cas = analyzeInput(ae);
+ Type tokenType = cas.getTypeSystem().getType(tokenTypeString);
+ iterator = cas.getAnnotationIndex(tokenType).iterator();
+ featurePath = cas.createFeaturePath();
+ featurePath.initialize(typeAttributeFeaturePath);
+ }
+
+ @Override
+ public boolean incrementToken() throws IOException {
+ if (iterator == null) {
+ try {
+ analyzeText();
+ } catch (Exception e) {
+ throw new IOException(e);
+ }
+ }
+ if (iterator.hasNext()) {
+ clearAttributes();
+ AnnotationFS next = iterator.next();
+ termAttr.setEmpty();
+ termAttr.append(next.getCoveredText());
+ termAttr.setLength(next.getCoveredText().length());
+ offsetAttr.setOffset(correctOffset(next.getBegin()), correctOffset(next.getEnd()));
+ typeAttr.setType(featurePath.getValueAsString(next));
+ return true;
+ } else {
+ iterator = null;
+ return false;
+ }
+ }
+
+}
Index: modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMABaseAnalyzer.java
===================================================================
--- modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMABaseAnalyzer.java (revision 0)
+++ modules/analysis/uima/src/java/org/apache/lucene/analysis/uima/UIMABaseAnalyzer.java (revision 0)
@@ -0,0 +1,42 @@
+package org.apache.lucene.analysis.uima;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import org.apache.lucene.analysis.Analyzer;
+
+import java.io.Reader;
+
+/**
+ * An {@link Analyzer} which use the {@link UIMAAnnotationsTokenizer} for creating tokens
+ */
+public final class UIMABaseAnalyzer extends Analyzer {
+
+ private String descriptorPath;
+ private String tokenType;
+
+ public UIMABaseAnalyzer(String descriptorPath, String tokenType) {
+ this.descriptorPath = descriptorPath;
+ this.tokenType = tokenType;
+ }
+
+ @Override
+ protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
+ return new TokenStreamComponents(new UIMAAnnotationsTokenizer(descriptorPath, tokenType, reader));
+ }
+
+}
Index: modules/analysis/uima/src/resources/uima/AggregateSentenceAE.xml
===================================================================
--- modules/analysis/uima/src/resources/uima/AggregateSentenceAE.xml (revision 0)
+++ modules/analysis/uima/src/resources/uima/AggregateSentenceAE.xml (revision 0)
@@ -0,0 +1,70 @@
+
+
+
+ org.apache.uima.java
+ false
+
+
+
+
+
+
+
+
+
+ AggregateSentenceAE
+
+ 1.0
+
+
+
+ ngramsize
+ Integer
+ false
+ false
+
+ HmmTagger/NGRAM_SIZE
+
+
+
+
+
+
+ WhitespaceTokenizer
+ HmmTagger
+
+
+
+
+
+
+
+ org.apache.uima.SentenceAnnotation
+ org.apache.uima.TokenAnnotation
+
+
+
+
+
+ true
+ true
+ false
+
+
+
+
Index: modules/analysis/uima/src/resources/uima/DummyEntityAEDescriptor.xml
===================================================================
--- modules/analysis/uima/src/resources/uima/DummyEntityAEDescriptor.xml (revision 0)
+++ modules/analysis/uima/src/resources/uima/DummyEntityAEDescriptor.xml (working copy)
@@ -18,7 +18,7 @@
org.apache.uima.java
true
- org.apache.solr.uima.processor.an.DummyEntityAnnotator
+ org.apache.lucene.analysis.uima.an.DummyEntityAnnotator
DummyEntityAEDescriptor
Index: modules/analysis/uima/build.xml
===================================================================
--- modules/analysis/uima/build.xml (revision 0)
+++ modules/analysis/uima/build.xml (working copy)
@@ -17,122 +17,27 @@
limitations under the License.
-->
-
+
- Analyzers
+ UIMA Analysis module
-
-
+
+
-
-
-
+
+
+
-
+
-
-
-
-
-
+
-
-
-
-
-
-
-
-
+
+
+
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
Index: modules/analysis/build.xml
===================================================================
--- modules/analysis/build.xml (revision 1240026)
+++ modules/analysis/build.xml (working copy)
@@ -27,6 +27,7 @@
- morfologik: Morfologik Stemmer
- smartcn: Smart Analyzer for Simplified Chinese Text
- stempel: Algorithmic Stemmer for Polish
+ - uima: UIMA Analysis module
@@ -57,8 +58,12 @@
+
+
+
+
-
+
@@ -68,6 +73,7 @@
+
@@ -77,6 +83,7 @@
+
@@ -86,6 +93,7 @@
+
@@ -95,6 +103,7 @@
+
@@ -104,6 +113,7 @@
+
@@ -116,6 +126,7 @@
+
@@ -126,6 +137,7 @@
+
@@ -136,6 +148,7 @@
+
Property changes on: dev-tools/idea/solr/contrib/uima/uima-solr.iml
___________________________________________________________________
Added: svn:eol-style
+ native
Index: dev-tools/idea/solr/contrib/uima/uima.iml
===================================================================
--- dev-tools/idea/solr/contrib/uima/uima.iml (revision 1240026)
+++ dev-tools/idea/solr/contrib/uima/uima.iml (working copy)
@@ -1,29 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Index: dev-tools/idea/modules/analysis/uima/uima-analysis.iml
===================================================================
--- dev-tools/idea/modules/analysis/uima/uima-analysis.iml (revision 0)
+++ dev-tools/idea/modules/analysis/uima/uima-analysis.iml (revision 0)
@@ -0,0 +1,27 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Index: dev-tools/maven/solr/contrib/uima/pom.xml.template
===================================================================
--- dev-tools/maven/solr/contrib/uima/pom.xml.template (revision 1240026)
+++ dev-tools/maven/solr/contrib/uima/pom.xml.template (working copy)
@@ -84,6 +84,10 @@
org.apache.uima
uimaj-core
+
+ org.apache.lucene
+ lucene-analyzers-uima
+
${build-directory}
Index: dev-tools/maven/modules/analysis/uima/pom.xml.template
===================================================================
--- dev-tools/maven/modules/analysis/uima/pom.xml.template (revision 0)
+++ dev-tools/maven/modules/analysis/uima/pom.xml.template (revision 0)
@@ -0,0 +1,89 @@
+
+
+ 4.0.0
+
+ org.apache.lucene
+ lucene-parent
+ @version@
+ ../../../lucene/pom.xml
+
+ org.apache.lucene
+ lucene-analyzers-uima
+ jar
+ Lucene UIMA analysis library
+
+ An Apache UIMA enabled set of tokenizers/analyzers
+
+
+ modules/analysis/uima
+ ../build/uima
+
+
+
+
+ ${project.groupId}
+ lucene-test-framework
+ ${project.version}
+ test
+
+
+ ${project.groupId}
+ lucene-core
+ ${project.version}
+
+
+ ${project.groupId}
+ lucene-analyzers-common
+ ${project.version}
+
+
+ org.apache.uima
+ uimaj-core
+ 2.3.1
+
+
+ org.apache.uima
+ Tagger
+ 2.3.1
+
+
+ org.apache.uima
+ WhitespaceTokenizer
+ 2.3.1
+
+
+
+ ${build-directory}
+ ${build-directory}/classes/java
+ ${build-directory}/classes/test
+ src/java
+ src/test
+
+
+ ${project.build.testSourceDirectory}
+
+ **/*.java
+
+
+
+
+