Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2901

Tika extracting points data from Chart

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.21
    • None
    • app
    • None

    Description

      I am using Tika to extract content from *.docx and other files. I am noticing Tika is extracting points from charts and putting them at the end of the file.
      I am using following code for extraction

           StringBuilder fileContent = new StringBuilder();
              Parser parser = new AutoDetectParser();
              ContentHandlerFactory factory = new BasicContentHandlerFactory(BasicContentHandlerFactory.HANDLER_TYPE.HTML,
                      -1);
              //InputStream inputStream = new BufferedInputStream(new FileInputStream(inputFileName));
              RecursiveParserWrapper wrapper = new RecursiveParserWrapper(parser, factory);
              Metadata metadata = new Metadata();
      
              ParseContext parseContext = new ParseContext();
              OfficeParserConfig officeParserConfig = new OfficeParserConfig();
              officeParserConfig.setUseSAXDocxExtractor(true);
              officeParserConfig.setIncludeDeletedContent(false);
              officeParserConfig.setIncludeMoveFromContent(false);
              officeParserConfig.setIncludeHeadersAndFooters(false);
              parseContext.set(OfficeParserConfig.class, officeParserConfig);
      
              wrapper.parse(inputStream, new DefaultHandler(), metadata, parseContext);
              String contents = metadata.get(RecursiveParserWrapper.TIKA_CONTENT);
              

      Please find the attach files for input and output from Tika.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mdasadul Md
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: