Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-870

Allow to use call parseToString with a additional parameter of MaxStringLength, so it can be changed per call

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.2
    • None
    • None

    Description

      It would be great to be able to call parseToString with an additional parameter of the maxStringLength, instead of having to set it on the Tika instance. This allows to set it per parse call. Sample code:

      public String parseToString(InputStream stream, Metadata metadata, int maxStringLength)
              throws IOException, TikaException {
          WriteOutContentHandler handler =
              new WriteOutContentHandler(maxStringLength);
          try {
              ParseContext context = new ParseContext();
              context.set(Parser.class, parser);
              parser.parse(
                      stream, new BodyContentHandler(handler), metadata, context);
          } catch (SAXException e) {
              if (!handler.isWriteLimitReached(e)) {
                  // This should never happen with BodyContentHandler...
                  throw new TikaException("Unexpected SAX processing failure", e);
              }
          } finally {
              stream.close();
          }
          return handler.toString();
      }
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mikemccand Michael McCandless
            kimchy Shay Banon
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment