Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 1.6.0
    • Fix Version/s: None
    • Component/s: Parsing
    • Labels:
    • Environment:
      Working with Windows 7 in eclipse.

      Description

      Parsing Error, Skipping Object
      java.io.IOException: expected='endstream' actual='' org.apache.pdfbox.io.PushBackInputStream@38011d45
      at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:439)
      at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:552)
      at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
      at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1088)
      at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053)
      at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:74)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
      at org.apache.tika.Tika.parseToString(Tika.java:357)
      at edu.uci.ics.crawler4j.crawler.BinaryParser.parse(BinaryParser.java:37)
      at edu.uci.ics.crawler4j.crawler.WebCrawler.handleBinary(WebCrawler.java:223)
      at edu.uci.ics.crawler4j.crawler.WebCrawler.processPage(WebCrawler.java:462)
      at edu.uci.ics.crawler4j.crawler.WebCrawler.run(WebCrawler.java:129)
      at java.lang.Thread.run(Thread.java:662)
      Did not found XRef object at specified startxref position 0

      This is the sample URL where I am facing this problem:-
      http://www.qualcomm.com/documents/files/rev-b-enhanced-mobile-broadband-for-all.pdf

      Any suggestions why is it happening...!! Or its a bug??

        Attachments

          Activity

            People

            • Assignee:
              lehmi Andreas Lehmkühler
              Reporter:
              raihan26 Raihan Jamal
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 336h
                336h
                Remaining:
                Remaining Estimate - 336h
                336h
                Logged:
                Time Spent - Not Specified
                Not Specified