Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1425

Automatic batching of Microsoft service calls

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.6
    • 1.17, 2.0.0-BETA, 2.1.0
    • translation
    • None

    Description

      Right now when I use the following code I get the stack trace at the bottom of this description. This seems to be because the Request URI is too large to make the service request. We need to have a mechansim within the call to Tika.translate which will, on a service-by-service basis, determine the maximum Request URI which can be sent. I beleive that this should be on the Tika side as how else am I meant to know the maximum request size?

      translator.java
      +    Translator translate = new MicrosoftTranslator();
      +    ((MicrosoftTranslator) translate).setId("...");
      +    ((MicrosoftTranslator) translate).setSecret("...");
           for (java.util.Map.Entry<Text, Parse> entry : parseResult) {
             Parse parse = entry.getValue();
             LOG.info("---------\nUrl\n---------------\n");
      @@ -201,7 +207,7 @@
             System.out.print(parse.getData().toString());
             if (dumpText) {
               LOG.info("---------\nParseText\n---------\n");
      -        System.out.print(parse.getText());
      +        System.out.print(translate.translate(parse.getText(), "fr"));
             }
      
      stacktrace.log
      Exception in thread "main" java.lang.Exception: [microsoft-translator-api] Error retrieving translation : Server returned HTTP response code: 414 for URL: http://api.microsofttranslator.com/V2/Ajax.svc/Translate?&from=&to=fr&text=%D0%A4%D0...
      ...
      	at com.memetix.mst.MicrosoftTranslatorAPI.retrieveString(MicrosoftTranslatorAPI.java:202)
      	at com.memetix.mst.translate.Translate.execute(Translate.java:61)
      	at com.memetix.mst.translate.Translate.execute(Translate.java:76)
      	at org.apache.tika.language.translate.MicrosoftTranslator.translate(MicrosoftTranslator.java:104)
      	at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:210)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      	at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:228)
      Caused by: java.io.IOException: Server returned HTTP response code: 414 for URL: http://api.microsofttranslator.com/V2/Ajax.svc/Translate?&from=&to=fr&text=%D0%A4%D0%BE%D1%80%D1%83%D0%B...
      ...
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
      	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
      	at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1675)
      	at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1673)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1671)
      	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1244)
      	at com.memetix.mst.MicrosoftTranslatorAPI.retrieveResponse(MicrosoftTranslatorAPI.java:178)
      	at com.memetix.mst.MicrosoftTranslatorAPI.retrieveString(MicrosoftTranslatorAPI.java:199)
      	... 6 more
      Caused by: java.io.IOException: Server returned HTTP response code: 414 for URL: http://api.microsofttranslator.com/V2/Ajax.svc/Translate?&from=&to=fr&text=%D0%A4%D0%BE...
      ...
      	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1626)
      	at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
      	at com.memetix.mst.MicrosoftTranslatorAPI.retrieveResponse(MicrosoftTranslatorAPI.java:177)
      	... 7 more
      

      Attachments

        Activity

          People

            lewismc Lewis John McGibbney
            lewismc Lewis John McGibbney
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: