Uploaded image for project: 'ManifoldCF'
  1. ManifoldCF
  2. CONNECTORS-1312

jcifs.smb.SmbException: Connection reset by peer: socket write error

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • ManifoldCF 2.5
    • ManifoldCF 2.5
    • JCIFS connector
    • None
    • Windows x64, java 1.8.x

    Description

      hi Karl,

      we've found another JCIFS exception: Windows share jobs stop when encountering a "Connection reset by peer" error, e.g.:

      ERROR 2016-05-03 15:29:24,209 (Worker thread '80') - JCIFS: SmbException tossed processing smb://server.domain.com/path/file.ppt
      jcifs.smb.SmbException: Connection reset by peer: socket write error
      java.net.SocketException: Connection reset by peer: socket write error
      	at java.net.SocketOutputStream.socketWrite0(Native Method)
      	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
      	at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
      	at jcifs.smb.SmbTransport.doSend(SmbTransport.java:453)
      	at jcifs.util.transport.Transport.sendrecv(Transport.java:67)
      	at jcifs.smb.SmbTransport.send(SmbTransport.java:655)
      	at jcifs.smb.SmbSession.send(SmbSession.java:238)
      	at jcifs.smb.SmbTree.send(SmbTree.java:119)
      	at jcifs.smb.SmbFile.send(SmbFile.java:775)
      	at jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181)
      	at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142)
      	at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
      	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
      	at java.io.FilterInputStream.read(FilterInputStream.java:107)
      	at java.nio.file.Files.copy(Files.java:2908)
      	at java.nio.file.Files.copy(Files.java:3027)
      	at org.apache.tika.io.TikaInputStream.getPath(TikaInputStream.java:587)
      	at org.apache.tika.io.TikaInputStream.getFile(TikaInputStream.java:615)
      	at org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:358)
      	at org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:424)
      	at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77)
      	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112)
      	at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:48)
      	at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:227)
      	at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3224)
      	at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3075)
      	at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2706)
      	at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
      	at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
      	at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
      	at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:979)
      	at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
      

      Current workaround - to start the job again (manually or by the scheduler).

      It is clear, that there are many errors, when it makes no sense to skip a failed URL and continue the job, e.g.:

      Error: SmbAuthException thrown: Logon failure: unknown user name or bad password.
      

      I'm thinking about a general solution, like defining a list (through the UI or properties.xml) with non severe exceptions, like "file busy" or "symlink detected" etc, so the admins would be able to specify, when the crawler should stop and when it should retry, skip and go further.

      What do you think?
      Thank you!

      Attachments

        Activity

          People

            kwright@metacarta.com Karl Wright
            kavdeev Konstantin Avdeev
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: