Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0ruta
    • Fix Version/s: 2.3.1ruta
    • Component/s: Ruta
    • Labels:
      None
    • Environment:

      OS X 10.9.1, Java v8u45, Eclipse Luna
      Windows 7, Java v8u45, Eclipse Luna

      Description

      New available UIMA Ruta Runtime 2.7.0 & Workbench 2.3.0 for Eclipse has lost proper functionality of MARKTABLE action. This action stopped annotating of all words from a csv file. I had noticed that the problem happened only for words written in Cyrillic witch contains spaces, i.e. for Latin it works fine. Please use sample outlined below in order to reproduce the problem i'm talking about.

      1. script/main.ruta
        WORDTABLE Dict = 'dict.csv';
        DECLARE Annotation Test (STRING meaning);
        Document {-> MARKTABLE(Test,1,Dict, "meaning" = 2)};
      1. resources/dict.csv
        від;from
        с какой стати;why
        с которой;fromWhich
        сюда;here
        по какому;which
        сюди;here
        как нибудь;somehow
        сколько;howMuch
      1. input/test.txt
        від с какой стати с которой сюда по какому сюди как нибудь сколько

      After main.ruta script execution we wont get annotated everything from test.txt Worth mentioning that Cyrillic letter like 'с' at the beginning of string, somehow affecting on processing behavior. Moreover, by removing lines with spaces, will get rid us from the issue described above.

        Activity

        Hide
        pkluegl Peter Klügl added a comment -

        Thanks for reporting this. I will take a look at it.

        I've seen that you changed the fixVersion. I fear that we have to specify the version of the release in wich this bug is fixed. We are using this field in the release process for collecting the fixed issues.

        Show
        pkluegl Peter Klügl added a comment - Thanks for reporting this. I will take a look at it. I've seen that you changed the fixVersion. I fear that we have to specify the version of the release in wich this bug is fixed. We are using this field in the release process for collecting the fixed issues.
        Hide
        submedia Oleg Fedoriaka added a comment -

        Firstly I specified wrong release by mistake then had remarked this and fixed to the current one.
        I also can confirm that in Eclipse IDE for Java Developers 4.4.2.20150219-0708, UIMA Runtime 2.6.0 & UIMA Ruta Workbench 2.2.1 on system configurations mentioned above the following code below works normally:
        WORDTABLE Dict = 'dict.csv';
        DECLARE Annotation Test (STRING meaning);
        Document {-> RETAINTYPE(SPACE)};
        Document {-> MARKTABLE(Test,1,Dict, "meaning" = 2)};

        Show
        submedia Oleg Fedoriaka added a comment - Firstly I specified wrong release by mistake then had remarked this and fixed to the current one. I also can confirm that in Eclipse IDE for Java Developers 4.4.2.20150219-0708, UIMA Runtime 2.6.0 & UIMA Ruta Workbench 2.2.1 on system configurations mentioned above the following code below works normally: WORDTABLE Dict = 'dict.csv'; DECLARE Annotation Test (STRING meaning); Document {-> RETAINTYPE(SPACE)}; Document {-> MARKTABLE(Test,1,Dict, "meaning" = 2)};
        Hide
        pkluegl Peter Klügl added a comment -

        Yes, there were some changes in the MARKTABLE logic and how to use it. I have to take a closer look, but I think this problem can be fixed quite fast.
        I was referring to the version mentions in this ticket/issue. There are two of them: affectsVersion and fixVersion. affectsVersion is the version of the release where this problem has been observed (2.3.0ruta). fixVersion is the version of the release where this problem is fixed. So, I will have to change the fixVersion (currently 2.1.0ruta) to 2.3.1ruta

        Show
        pkluegl Peter Klügl added a comment - Yes, there were some changes in the MARKTABLE logic and how to use it. I have to take a closer look, but I think this problem can be fixed quite fast. I was referring to the version mentions in this ticket/issue. There are two of them: affectsVersion and fixVersion. affectsVersion is the version of the release where this problem has been observed (2.3.0ruta). fixVersion is the version of the release where this problem is fixed. So, I will have to change the fixVersion (currently 2.1.0ruta) to 2.3.1ruta
        Hide
        pkluegl Peter Klügl added a comment -

        The problem is caused by the combination of filtering settings in the rule script and the entries in the table. The table lookup is not able to see whitespaces since these are filtered by default. However, the table contains entries with spaces. This can cause problems since the table uses a trie structure for representing the column data. There is no lookahead when automatically skipping spaces in the entries. Therefore, the matches for entries fail that have chars that also occur after whitespaces in other entries.

        There are several ways to solve or avoid this problem.

        • remove the whitespaces in dict.csv (best way right now, just tested it, but makes the table hard to read)
        • activate a special configurations parameter (is currently missing in BasicEngine.xml and in the generated descriptors, and has probably still some problems in your use case. This should be the nice solution for the first point)
        • make the lookup process sensible to whitespaces (This is often not wanted and needs a different configuration of the table call and rules)

        The difference to UIMA Ruta 2.2.1 to 2.3.0 is caused by UIMA-4079, where a problem with whitespaces in tables has been fixed.

        Show
        pkluegl Peter Klügl added a comment - The problem is caused by the combination of filtering settings in the rule script and the entries in the table. The table lookup is not able to see whitespaces since these are filtered by default. However, the table contains entries with spaces. This can cause problems since the table uses a trie structure for representing the column data. There is no lookahead when automatically skipping spaces in the entries. Therefore, the matches for entries fail that have chars that also occur after whitespaces in other entries. There are several ways to solve or avoid this problem. remove the whitespaces in dict.csv (best way right now, just tested it, but makes the table hard to read) activate a special configurations parameter (is currently missing in BasicEngine.xml and in the generated descriptors, and has probably still some problems in your use case. This should be the nice solution for the first point) make the lookup process sensible to whitespaces (This is often not wanted and needs a different configuration of the table call and rules) The difference to UIMA Ruta 2.2.1 to 2.3.0 is caused by UIMA-4079 , where a problem with whitespaces in tables has been fixed.
        Hide
        submedia Oleg Fedoriaka added a comment -

        Thanks a lot for an explanation. The fist suggested way to overcome the problem works, but has shortages.
        Could you please write in detail regarding second method, what changes should be done in BasicEngine.xml and other files in order to achieve needed result. By the way, what kind of problems did you mean I may probably face in my case?
        So, let me test it out please, because it might will be better solution for me. Thanks.

        Show
        submedia Oleg Fedoriaka added a comment - Thanks a lot for an explanation. The fist suggested way to overcome the problem works, but has shortages. Could you please write in detail regarding second method, what changes should be done in BasicEngine.xml and other files in order to achieve needed result. By the way, what kind of problems did you mean I may probably face in my case? So, let me test it out please, because it might will be better solution for me. Thanks.
        Hide
        pkluegl Peter Klügl added a comment - - edited

        The problems are not on your side but in UIMA Ruta. I assume the functionality has some minor bugs. I tested the parameter, but the results were not as expected. I will take a look at it in the next days and get back to you.

        Show
        pkluegl Peter Klügl added a comment - - edited The problems are not on your side but in UIMA Ruta. I assume the functionality has some minor bugs. I tested the parameter, but the results were not as expected. I will take a look at it in the next days and get back to you.
        Hide
        pkluegl Peter Klügl added a comment -

        There was a small bug, which is now fixed in the current trunk. The analysis engine has a parameter "dictRemoveWS". If set to true, all whitespace are removed when simple text files are loaded. I tested the changes with your example and all works fine now.

        In case you want to use these changes: let me know when you need help.

        Show
        pkluegl Peter Klügl added a comment - There was a small bug, which is now fixed in the current trunk. The analysis engine has a parameter "dictRemoveWS". If set to true, all whitespace are removed when simple text files are loaded. I tested the changes with your example and all works fine now. In case you want to use these changes: let me know when you need help.
        Hide
        submedia Oleg Fedoriaka added a comment -

        Yes, I need it, could you please tell me how to get it work in Eclipse. Should I replace something in existing installation of UIMA Ruta or just get everything from repo?

        Show
        submedia Oleg Fedoriaka added a comment - Yes, I need it, could you please tell me how to get it work in Eclipse. Should I replace something in existing installation of UIMA Ruta or just get everything from repo?
        Hide
        pkluegl Peter Klügl added a comment -

        Sorry for the late response.

        You need to use the current trunk.

        For normal maven builds:
        You could add the snapshot repository to your maven repositories. Or you could install the snapshot version in your local m2 folder:

        • Check out the current sources.
        • mvn clean install on the root project: ruta

        For updating the UIMA Ruta Workbench:

        • Check out the current sources.
        • mvn clean install on the root project: ruta
        • change the versions for the eclipse bundles in ruta-eclipse-update-site
          <item-maven-release-version>2.3.0</item-maven-release-version>
          <item-eclipse-release-version>2.3.0</item-eclipse-release-version>
          

          to something like

          <item-maven-release-version>2.3.1-SNAPSHOT</item-maven-release-version>
          <item-eclipse-release-version>2.3.0</item-eclipse-release-version>
          

          You could also use 2.3.1 or 2.3.1.SNAPSHOT instead of 2.3.0 for item-eclipse-release-version

        In case of 2.3.0: go to ruta-eclipse-update-site/target/eclipse-update-site/ruta/plugins and copy the jars to the plugin folder of your eclispe installation with the UIMA Ruta Workbench. You maybe need to start eclispe with the -clean option

        In case of 2.3.1 or 2.3.1.SNAPSHOT, you can add a local update site and install the complete feature updating the old UIMA Ruta Workbench.

        Let me know if this helps and if you need a more detailed explanation for some steps.

        Show
        pkluegl Peter Klügl added a comment - Sorry for the late response. You need to use the current trunk. For normal maven builds: You could add the snapshot repository to your maven repositories. Or you could install the snapshot version in your local m2 folder: Check out the current sources. mvn clean install on the root project: ruta For updating the UIMA Ruta Workbench: Check out the current sources. mvn clean install on the root project: ruta change the versions for the eclipse bundles in ruta-eclipse-update-site <item-maven-release-version>2.3.0</item-maven-release-version> <item-eclipse-release-version>2.3.0</item-eclipse-release-version> to something like <item-maven-release-version>2.3.1-SNAPSHOT</item-maven-release-version> <item-eclipse-release-version>2.3.0</item-eclipse-release-version> You could also use 2.3.1 or 2.3.1.SNAPSHOT instead of 2.3.0 for item-eclipse-release-version In case of 2.3.0: go to ruta-eclipse-update-site/target/eclipse-update-site/ruta/plugins and copy the jars to the plugin folder of your eclispe installation with the UIMA Ruta Workbench. You maybe need to start eclispe with the -clean option In case of 2.3.1 or 2.3.1.SNAPSHOT, you can add a local update site and install the complete feature updating the old UIMA Ruta Workbench. Let me know if this helps and if you need a more detailed explanation for some steps.
        Hide
        submedia Oleg Fedoriaka added a comment -

        I executed the following commands:
        $ svn export https://svn.apache.org/repos/asf/uima/ruta/trunk/ ruta/ && cd ruta/
        $ mvn clean install -Dmaven.javadoc.skip=true

        ruta-2.3.1-SNAPSHOT-bin.zip has been created at ruta/target dir which contains these jars:
        eclipsePlugins/org.apache.uima.ruta.addons_2.3.1.SNAPSHOT.jar
        eclipsePlugins/org.apache.uima.ruta.caseditor_2.3.1.SNAPSHOT.jar
        eclipsePlugins/org.apache.uima.ruta.engine_2.3.1.SNAPSHOT.jar
        eclipsePlugins/org.apache.uima.ruta.ide_2.3.1.SNAPSHOT.jar
        eclipsePlugins/org.apache.uima.ruta.textruler_2.3.1.SNAPSHOT.jar
        lib/ruta-core-2.3.1-SNAPSHOT.jar

        My issue is that I cannot find ruta-eclipse-update-site/target/eclipse-update-site/ruta/plugins path neither in ruta/ nor in local maven repository. I also tried just dummy copy/paste listed above jars into /Applications/Eclipse.app/Contents/Eclipse/plugins/ on OS X without success. So ,I would appreciated if you explain in more details starting from this point, what next I have to do?

        Show
        submedia Oleg Fedoriaka added a comment - I executed the following commands: $ svn export https://svn.apache.org/repos/asf/uima/ruta/trunk/ ruta/ && cd ruta/ $ mvn clean install -Dmaven.javadoc.skip=true ruta-2.3.1-SNAPSHOT-bin.zip has been created at ruta/target dir which contains these jars: eclipsePlugins/org.apache.uima.ruta.addons_2.3.1.SNAPSHOT.jar eclipsePlugins/org.apache.uima.ruta.caseditor_2.3.1.SNAPSHOT.jar eclipsePlugins/org.apache.uima.ruta.engine_2.3.1.SNAPSHOT.jar eclipsePlugins/org.apache.uima.ruta.ide_2.3.1.SNAPSHOT.jar eclipsePlugins/org.apache.uima.ruta.textruler_2.3.1.SNAPSHOT.jar lib/ruta-core-2.3.1-SNAPSHOT.jar My issue is that I cannot find ruta-eclipse-update-site/target/eclipse-update-site/ruta/plugins path neither in ruta/ nor in local maven repository. I also tried just dummy copy/paste listed above jars into /Applications/Eclipse.app/Contents/Eclipse/plugins/ on OS X without success. So ,I would appreciated if you explain in more details starting from this point, what next I have to do?
        Hide
        pkluegl Peter Klügl added a comment -

        Best, ignore the bin.zip, it's not an maintained artifact. I assume that there is even a plugin missing (org.apache.uima.ruta.ide.ui).

        There are two ways: creating an update site for installing new software or manual installation.

        I mentioned copying jars to the plugins folder. This was probably a bad advice. This would only work if the plugins excatly replace the previous ones. In you case, they have a different version and there is not feature bundling them. Therefore, they are ignored. For manually installing plugins in your Eclipse, there is the dropins folder (http://stackoverflow.com/questions/2763843/eclipse-plugins-vs-features-vs-dropins). You could try to copy the plugin jars to the dropins folder (don't forget org.apache.uima.ruta.ide.ui). Maybe you need to uninstall the old ruta feqture before and/or restart your eclipse with the -clean option.

        Now to the update site (my preferred way):
        In the ruta folder there should be a folder named ruta-eclipse-update-site. That's the update site with the pom you need to change. It's not a module of the reactor/root project. Execute mvn clean install on it separately. Then, there should be a folder /target/eclipse-update-site/ruta, which can be used in Eclipse for installing new software.

        Much depends on the version properties set in the pom. If the version are pointing to 2.3.0 then you cannot use the update site to update your existing installation. (You could use the plugin to replace the old ones, but you should only do that if you know what you are doing). If the version points to 2.3.1 or 2.3.1.SNAPSHOT, you can update, but not to the next release.

        Let me know if this helps.

        Show
        pkluegl Peter Klügl added a comment - Best, ignore the bin.zip, it's not an maintained artifact. I assume that there is even a plugin missing (org.apache.uima.ruta.ide.ui). There are two ways: creating an update site for installing new software or manual installation. I mentioned copying jars to the plugins folder. This was probably a bad advice. This would only work if the plugins excatly replace the previous ones. In you case, they have a different version and there is not feature bundling them. Therefore, they are ignored. For manually installing plugins in your Eclipse, there is the dropins folder ( http://stackoverflow.com/questions/2763843/eclipse-plugins-vs-features-vs-dropins ). You could try to copy the plugin jars to the dropins folder (don't forget org.apache.uima.ruta.ide.ui). Maybe you need to uninstall the old ruta feqture before and/or restart your eclipse with the -clean option. Now to the update site (my preferred way): In the ruta folder there should be a folder named ruta-eclipse-update-site. That's the update site with the pom you need to change. It's not a module of the reactor/root project. Execute mvn clean install on it separately. Then, there should be a folder /target/eclipse-update-site/ruta, which can be used in Eclipse for installing new software. Much depends on the version properties set in the pom. If the version are pointing to 2.3.0 then you cannot use the update site to update your existing installation. (You could use the plugin to replace the old ones, but you should only do that if you know what you are doing). If the version points to 2.3.1 or 2.3.1.SNAPSHOT, you can update, but not to the next release. Let me know if this helps.
        Hide
        submedia Oleg Fedoriaka added a comment -

        Thanks, info above help me. I ran mvn clean install on ruta-eclipse-update-site and get ruta-eclipse-update-site/target/eclipse-update-site/ruta built successfully (attached below).

        Worth mentioning, I didn't change:
        <item-maven-release-version>2.3.0</item-maven-release-version>
        <item-eclipse-release-version>2.3.0</item-eclipse-release-version>
        in ruta-eclipse-update-site/pom.xml because of problems with building.

        After that I had started Eclipse, removed UIMA Ruta Workbench and restarted it. Then I made sure that there is no Ruta Workbenches exists in order to install Workbench which had been created previously by specifying ruta-eclipse-update-site/target/eclipse-update-site/ruta as the path to update site. Once installation completed (success), IDE had been closed. Finally, I set -clean option into /Applications/Eclipse.app/Contents/Eclipse/eclipse.ini and started Eclipse again. Just in case, I've recreated RUTA Project outlined above for test which ended failure.

        It's still the same problem persists. In this way, the questions is being whether i done something wrong or other changes is needed?)

        Show
        submedia Oleg Fedoriaka added a comment - Thanks, info above help me. I ran mvn clean install on ruta-eclipse-update-site and get ruta-eclipse-update-site/target/eclipse-update-site/ruta built successfully (attached below). Worth mentioning, I didn't change: <item-maven-release-version>2.3.0</item-maven-release-version> <item-eclipse-release-version>2.3.0</item-eclipse-release-version> in ruta-eclipse-update-site/pom.xml because of problems with building. After that I had started Eclipse, removed UIMA Ruta Workbench and restarted it. Then I made sure that there is no Ruta Workbenches exists in order to install Workbench which had been created previously by specifying ruta-eclipse-update-site/target/eclipse-update-site/ruta as the path to update site. Once installation completed (success), IDE had been closed. Finally, I set -clean option into /Applications/Eclipse.app/Contents/Eclipse/eclipse.ini and started Eclipse again. Just in case, I've recreated RUTA Project outlined above for test which ended failure. It's still the same problem persists. In this way, the questions is being whether i done something wrong or other changes is needed?)
        Hide
        pkluegl Peter Klügl added a comment - - edited

        Yes, the additional configuration parameter is set to false by default and needs to be activated. In your use case, the best approach to do that is probably changing its value in the BasicEngine.xml. This descriptor is applied for generating all descriptors and therefore the values specified there are reused.

        To do this, you have to:

        • open th file descriptor/BasicEngine.xml with the "Component Descriptor Editor" in Eclipse
        • switch to the "Parameter Settings" tab
        • select the parameter "dictRemoveWS" (normally left part) and set its value to true (normally right part)
        • save the descriptor and rebuild all descriptors, especially that one that is used to create the actual analysis engine, e.g., by changing the rule file.

        If there is no "dictRemoveWs" parameter, then you need to update your Ruta project, e.g., by UIMA Ruta -> Convert to UIMA Ruta project in the popup menu of a project.

        There is a possibility that this basic descriptor is not applied for building the new descriptors. In case the above does not work, you could try to set the parameter value directly in the descriptor of your rule script. However, the value will be overridden when you store the script file.

        Show
        pkluegl Peter Klügl added a comment - - edited Yes, the additional configuration parameter is set to false by default and needs to be activated. In your use case, the best approach to do that is probably changing its value in the BasicEngine.xml. This descriptor is applied for generating all descriptors and therefore the values specified there are reused. To do this, you have to: open th file descriptor/BasicEngine.xml with the "Component Descriptor Editor" in Eclipse switch to the "Parameter Settings" tab select the parameter "dictRemoveWS" (normally left part) and set its value to true (normally right part) save the descriptor and rebuild all descriptors, especially that one that is used to create the actual analysis engine, e.g., by changing the rule file. If there is no "dictRemoveWs" parameter, then you need to update your Ruta project, e.g., by UIMA Ruta -> Convert to UIMA Ruta project in the popup menu of a project. There is a possibility that this basic descriptor is not applied for building the new descriptors. In case the above does not work, you could try to set the parameter value directly in the descriptor of your rule script. However, the value will be overridden when you store the script file.
        Hide
        submedia Oleg Fedoriaka added a comment -

        I've tried everything above, but nothing is working. If I can do something else, please tell me.
        Anyway, thank you for assistance. I'll utilize approach of removing spaces from table till new update will be available, hope with fixed problem.

        Show
        submedia Oleg Fedoriaka added a comment - I've tried everything above, but nothing is working. If I can do something else, please tell me. Anyway, thank you for assistance. I'll utilize approach of removing spaces from table till new update will be available, hope with fixed problem.
        Hide
        pkluegl Peter Klügl added a comment -

        Sorry about that. Yes, this is probably too much effort compared to the simple workaround until the next release.

        In case you want to still try something:
        You have to change one version in the pom of the update site, at least item-maven-release-version
        Right now, you just built the update site of the current release missing the fix. (Sorry, I didn't investigate your attachment right away)

        Show
        pkluegl Peter Klügl added a comment - Sorry about that. Yes, this is probably too much effort compared to the simple workaround until the next release. In case you want to still try something: You have to change one version in the pom of the update site, at least item-maven-release-version Right now, you just built the update site of the current release missing the fix. (Sorry, I didn't investigate your attachment right away)
        Hide
        submedia Oleg Fedoriaka added a comment - - edited

        I'm glad to report you the challenge is being overcame!
        For whom it may concern, keep in mind that some-why setting dictRemoveWS parameter in descriptor/BasicEngine.xml does not make effect to other scripts by updating a project. Therefore, you have to set it manually where its needed.
        Please find attached is right ruta build of 1688318 svn revision. ruta/ruta-eclipse-update-site/pom.xml has been changed as follows:
        — <artifactId>ruta-eclipse-update-site</artifactId> <packaging>pom</packaging><version>2.3.0</version> replaced with <version>2.3.1-SNAPSHOT</version>
        — <item-maven-release-version>2.3.0</item-maven-release-version> replaced with <item-maven-release-version>2.3.1-SNAPSHOT</item-maven-release-version>
        Tested on OS X, Eclipse Mars, JDK1.8.0_45, UIMA Ruta Workbench 2.3.1-SNAPSHOT — works like a charm.
        Thanks Peter.

        Show
        submedia Oleg Fedoriaka added a comment - - edited I'm glad to report you the challenge is being overcame! For whom it may concern, keep in mind that some-why setting dictRemoveWS parameter in descriptor/BasicEngine.xml does not make effect to other scripts by updating a project. Therefore, you have to set it manually where its needed. Please find attached is right ruta build of 1688318 svn revision. ruta/ruta-eclipse-update-site/pom.xml has been changed as follows: — <artifactId>ruta-eclipse-update-site</artifactId> <packaging>pom</packaging><version>2.3.0</version> replaced with <version>2.3.1-SNAPSHOT</version> — <item-maven-release-version>2.3.0</item-maven-release-version> replaced with <item-maven-release-version>2.3.1-SNAPSHOT</item-maven-release-version> Tested on OS X, Eclipse Mars, JDK1.8.0_45, UIMA Ruta Workbench 2.3.1-SNAPSHOT — works like a charm. Thanks Peter.
        Hide
        pkluegl Peter Klügl added a comment - - edited

        I'm glad to hear that. I don't know what went wrong, but I created a new issue in order to investigate that.

        Btw, I personally work always with the latest sources, also in the UIMA Ruta Workbench. I do not create new update sites, but start a complete Eclipse (with all plugins) from within Eclipse (workspace contains the trunk) by "run as Eclipse Application". This way, I have the current functionality of the trunk and hot code replacement.

        Show
        pkluegl Peter Klügl added a comment - - edited I'm glad to hear that. I don't know what went wrong, but I created a new issue in order to investigate that. Btw, I personally work always with the latest sources, also in the UIMA Ruta Workbench. I do not create new update sites, but start a complete Eclipse (with all plugins) from within Eclipse (workspace contains the trunk) by "run as Eclipse Application". This way, I have the current functionality of the trunk and hot code replacement.

          People

          • Assignee:
            pkluegl Peter Klügl
            Reporter:
            submedia Oleg Fedoriaka
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 96h
              96h
              Remaining:
              Remaining Estimate - 96h
              96h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development