Solr
  1. Solr
  2. SOLR-2961

DIH with threads and TikaEntityProcessor JDBC ISsue

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 3.4, 3.5
    • Fix Version/s: None
    • Labels:
    • Environment:

      Windows Server 2008, Apache Tomcat 6, Oracle 11g, ojdbc 11.2.0.1

      Description

      I have a DIH Configuration that works great when I dont specify threads="X" in the root entity. As soon as I give a value for threads, I get the following error messages in the stacktrace. Please advise.

      SEVERE: JdbcDataSource was not closed prior to finalize(), indicates a bug – POSSIBLE RESOURCE LEAK!!!
      Dec 10, 2011 1:18:33 PM org.apache.solr.handler.dataimport.JdbcDataSource closeConnection
      SEVERE: Ignoring Error when closing connection
      java.sql.SQLRecoverableException: IO Error: Socket closed
      at oracle.jdbc.driver.T4CConnection.logoff(T4CConnection.java:511)
      at oracle.jdbc.driver.PhysicalConnection.close(PhysicalConnection.java:3931)
      at org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:401)
      at org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:392)
      at org.apache.solr.handler.dataimport.JdbcDataSource.finalize(JdbcDataSource.java:380)
      at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
      at java.lang.ref.Finalizer.runFinalizer(Unknown Source)
      at java.lang.ref.Finalizer.access$100(Unknown Source)
      at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)
      Caused by: java.net.SocketException: Socket closed
      at java.net.SocketOutputStream.socketWrite(Unknown Source)
      at java.net.SocketOutputStream.write(Unknown Source)
      at oracle.net.ns.DataPacket.send(DataPacket.java:199)
      at oracle.net.ns.NetOutputStream.flush(NetOutputStream.java:211)
      at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:227)
      at oracle.net.ns.NetInputStream.read(NetInputStream.java:175)
      at oracle.net.ns.NetInputStream.read(NetInputStream.java:100)
      at oracle.net.ns.NetInputStream.read(NetInputStream.java:85)
      at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123)
      at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79)
      at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1122)
      at oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1099)
      at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:288)
      at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:191)
      at oracle.jdbc.driver.T4C7Ocommoncall.doOLOGOFF(T4C7Ocommoncall.java:61)
      at oracle.jdbc.driver.T4CConnection.logoff(T4CConnection.java:498)
      ... 8 more
      Dec 10, 2011 1:18:34 PM org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper nextRow
      SEVERE: Exception in entity : null
      org.apache.solr.handler.dataimport.DataImportHandlerException: Failed to initialize DataSource: f2
      at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
      at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:333)
      at org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:99)
      at org.apache.solr.handler.dataimport.ThreadedContext.getDataSource(ThreadedContext.java:66)
      at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:101)
      at org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper.nextRow(ThreadedEntityProcessorWrapper.java:84)
      at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(DocBuilder.java:446)
      at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.run(DocBuilder.java:399)
      at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(DocBuilder.java:466)
      at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.run(DocBuilder.java:399)
      at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(DocBuilder.java:466)
      at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.access$000(DocBuilder.java:353)
      at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner$1.run(DocBuilder.java:406)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.lang.Thread.run(Unknown Source)
      Caused by: java.lang.ClassCastException: org.apache.solr.handler.dataimport.TikaEntityProcessor cannot be cast to org.apache.solr.handler.dataimport.EntityProcessorWrapper
      at org.apache.solr.handler.dataimport.FieldStreamDataSource.init(FieldStreamDataSource.java:58)
      at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:331)
      ... 14 more
      Dec 10, 2011 1:18:34 PM org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper nextRow
      SEVERE: Exception in entity : null
      org.apache.solr.handler.dataimport.DataImportHandlerException: Failed to initialize DataSource: f2
      at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
      at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:333)
      at org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl.java:99)
      at org.apache.solr.handler.dataimport.ThreadedContext.getDataSource(ThreadedContext.java:66)
      at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:101)
      at org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper.nextRow(ThreadedEntityProcessorWrapper.java:84)
      at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(DocBuilder.java:446)
      at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.run(DocBuilder.java:399)
      at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(DocBuilder.java:466)
      at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.run(DocBuilder.java:399)
      at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(DocBuilder.java:466)
      at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.access$000(DocBuilder.java:353)
      at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner$1.run(DocBuilder.java:406)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.lang.Thread.run(Unknown Source)
      Caused by: java.lang.ClassCastException: org.apache.solr.handler.dataimport.TikaEntityProcessor cannot be cast to org.apache.solr.handler.dataimport.EntityProcessorWrapper
      at org.apache.solr.handler.dataimport.FieldStreamDataSource.init(FieldStreamDataSource.java:58)
      at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:331)
      ... 14 more
      Dec 10, 2011 1:18:34 PM org.apache.solr.handler.dataimport.JdbcDataSource finalize
      SEVERE: JdbcDataSource was not closed prior to finalize(), indicates a bug – POSSIBLE RESOURCE LEAK!!!
      Dec 10, 2011 1:18:34 PM org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper nextRow
      SEVERE: Exception in entity : null

      1. SOLR-2961.patch
        8 kB
        Mikhail Khludnev
      2. data-config.xml
        0.9 kB
        David Webb

        Activity

        Hide
        James Dyer added a comment -

        The "threads" feature was removed from DIH in Trunk/4.x (see SOLR-3262). Some "threads" bugs were fixed in version 3.6, the last release in which "threads" is available. (see SOLR-3011).

        Show
        James Dyer added a comment - The "threads" feature was removed from DIH in Trunk/4.x (see SOLR-3262 ). Some "threads" bugs were fixed in version 3.6, the last release in which "threads" is available. (see SOLR-3011 ).
        Hide
        James Dyer added a comment -

        Mikhail Khludnev commented on SOLR-3011:
        ----------------------------------------

        Is SOLR-2961 just for Tika?

        yep. it seems so. Why do you ask, we don't need to support it further?

        I don't think we have to support threads with everything. (This is one reason why I want to remove "threads" on Trunk. Its going to be very difficult to support every use-case.) On the other hand, if you or someone else puts up a good patch in the very near-term I will try to get it into 3.6.

        Show
        James Dyer added a comment - Mikhail Khludnev commented on SOLR-3011 : ---------------------------------------- Is SOLR-2961 just for Tika? yep. it seems so. Why do you ask, we don't need to support it further? I don't think we have to support threads with everything. (This is one reason why I want to remove "threads" on Trunk. Its going to be very difficult to support every use-case.) On the other hand, if you or someone else puts up a good patch in the very near-term I will try to get it into 3.6.
        Hide
        Mikhail Khludnev added a comment -

        multithreading patch is attached to SOLR-2947, after it will be committed I'll be able to solve this one.

        Show
        Mikhail Khludnev added a comment - multithreading patch is attached to SOLR-2947 , after it will be committed I'll be able to solve this one.
        Hide
        David Webb added a comment -

        Additional test confirms that the process does not die when specifying

        threads="1"

        on the root entity.

        Show
        David Webb added a comment - Additional test confirms that the process does not die when specifying threads= "1" on the root entity.
        Hide
        David Webb added a comment - - edited

        The ClassCastException is resolved like you said, but looks like the FieldStreamDataSource isn’t multithreaded. Or, my configuration is wrong. Thank you for the quick help.

        Dec 13, 2011 11:59:49 AM org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper nextRow
        SEVERE: Exception in entity : null
        java.lang.NullPointerException
                        at org.apache.solr.handler.dataimport.ThreadedContext.getVariableResolver(ThreadedContext.java:43)
                        at org.apache.solr.handler.dataimport.ThreadedContext.resolve(ThreadedContext.java:89)
                        at org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:63)
                        at org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:49)
                        at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:102)
                        at org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper.nextRow(ThreadedEntityProcessorWrapper.java:84)
                        at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(DocBuilder.java:437)
                        at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.run(DocBuilder.java:389)
                        at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(DocBuilder.java:457)
                        at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.run(DocBuilder.java:389)
                        at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(DocBuilder.java:457)
                        at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.access$100(DocBuilder.java:353)
                        at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner$1.run(DocBuilder.java:396)
                        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
                        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                        at java.lang.Thread.run(Unknown Source)
        
        Show
        David Webb added a comment - - edited The ClassCastException is resolved like you said, but looks like the FieldStreamDataSource isn’t multithreaded. Or, my configuration is wrong. Thank you for the quick help. Dec 13, 2011 11:59:49 AM org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper nextRow SEVERE: Exception in entity : null java.lang.NullPointerException at org.apache.solr.handler.dataimport.ThreadedContext.getVariableResolver(ThreadedContext.java:43) at org.apache.solr.handler.dataimport.ThreadedContext.resolve(ThreadedContext.java:89) at org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:63) at org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:49) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:102) at org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper.nextRow(ThreadedEntityProcessorWrapper.java:84) at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(DocBuilder.java:437) at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.run(DocBuilder.java:389) at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(DocBuilder.java:457) at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.run(DocBuilder.java:389) at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.runAThread(DocBuilder.java:457) at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner.access$100(DocBuilder.java:353) at org.apache.solr.handler.dataimport.DocBuilder$EntityRunner$1.run(DocBuilder.java:396) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)
        Hide
        Mikhail Khludnev added a comment -

        here is the fix for

        java.lang.ClassCastException: org.apache.solr.handler.dataimport.TikaEntityProcessor cannot be cast to org.apache.solr.handler.dataimport.EntityProcessorWrapper

        it amends FieldReaderDataSource and FieldStreamDataSource to use "context" for resolving instead downcasting to obtain resolver.

        But, multithreading support is a separate issue.

        Show
        Mikhail Khludnev added a comment - here is the fix for java.lang.ClassCastException: org.apache.solr.handler.dataimport.TikaEntityProcessor cannot be cast to org.apache.solr.handler.dataimport.EntityProcessorWrapper it amends FieldReaderDataSource and FieldStreamDataSource to use "context" for resolving instead downcasting to obtain resolver. But, multithreading support is a separate issue.
        Hide
        David Webb added a comment -

        Weird note, when threads="2", processing continues even though the stacktraces are output to the logs. When threads="6", when the error occues, the DIH process immediately stops and performs a rollback.

        This is preventing me from using DIH to load and maintain my production index. Any help is greatly appreciated since I am now at the 11th hour.

        Solr and all components have been stellar up to this point. Great project!

        Show
        David Webb added a comment - Weird note, when threads="2", processing continues even though the stacktraces are output to the logs. When threads="6", when the error occues, the DIH process immediately stops and performs a rollback. This is preventing me from using DIH to load and maintain my production index. Any help is greatly appreciated since I am now at the 11th hour. Solr and all components have been stellar up to this point. Great project!
        Hide
        David Webb added a comment -

        My trimmed down data config

        Show
        David Webb added a comment - My trimmed down data config

          People

          • Assignee:
            Unassigned
            Reporter:
            David Webb
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development