Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2794

Tika extracts text from pdf on MacBook, but not windows server.,

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.19.1
    • 2.0.0-BETA
    • parser
    • None
    • MacBook Pro and Windows Server 2012

      This code works on the enclosed pdf file on a MacBook, but not using windows server?

    • Patch, Important

    Description

      try:
      headers = {'X-Tika-PDFextractInlineImages': 'true',} #
      data = parser.from_file(pathtofile, serverEndpoint=self.TIKA_SERVER, headers=headers)
      charstoreturn = data['content'].strip().split()[:limit]
      charstoreturn = ' '.join(charstoreturn).replace("\n", " ").replace('"', "'").replace(",","").replace("'","'")
      return True, charstoreturn
      except Exception as err:
      return False, "error {} on file: {}.\n".format(str(err), pathtofile)

      Attachments

        1. test2.pdf
          184 kB
          Paul Hallett

        Activity

          People

            Unassigned Unassigned
            phallett Paul Hallett
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: