Solr
  1. Solr
  2. SOLR-1262

DIH needs support for callable statements

    Details

    • Type: Improvement Improvement
    • Status: Reopened
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.3
    • Fix Version/s: None
    • Labels:
      None
    • Environment:

      linux
      mysql

      Description

      During an indexing run we noticed that we were spending a lot of time creating and tearing down queries in mysql

      The queries we are using are complex and involve joins spanning across multiple tables.

      We should support prepared statements in the data import handler via the data-config.xml file - for those databases that support prepared statements.

      We could add a new attribute to the entity entity in dataConfig - say - pquery or preparedQuery and then pass the prepared statement and have values filled in by the actual queries for each row using a placeholder - like a ? or something else.

      I would probably start by hacking class JdbcDataSource to try a test but was wondering if anyone had experienced this or had any suggestions or if there is something in the works that I missed - I couldn't find any other bugs mentioning using prepared statements for performance.

        Activity

        Hide
        Noble Paul added a comment -

        do you mean prepared statement or callablestatement (stored procedures)? . I guess there is not much difference between the semantics of prepared statements and sql statements

        Show
        Noble Paul added a comment - do you mean prepared statement or callablestatement (stored procedures)? . I guess there is not much difference between the semantics of prepared statements and sql statements
        Hide
        Abdul Chaudhry added a comment -

        I could try to use stored procedures instead of trying to get prepared stements to work in the DIH - however that would mean most of the logic would need to live in SQL and I hate SQL.

        Show
        Abdul Chaudhry added a comment - I could try to use stored procedures instead of trying to get prepared stements to work in the DIH - however that would mean most of the logic would need to live in SQL and I hate SQL.
        Hide
        Noble Paul added a comment -

        however that would mean most of the logic would need to live in SQL and I hate SQL.

        prepared statement is essentially an SQL statement with placeholders . I do not see any value add by supporting PreparedStatements

        But stored procedure support is something we can look into

        Show
        Noble Paul added a comment - however that would mean most of the logic would need to live in SQL and I hate SQL. prepared statement is essentially an SQL statement with placeholders . I do not see any value add by supporting PreparedStatements But stored procedure support is something we can look into
        Hide
        Abdul Chaudhry added a comment -

        That's fine. Support for stored procedures would be great. I am not sure what code changes would be involved in the DIH - I guess you would need a CDATA section in the XML data-config file - we are currently using MYSQL 5.0.something

        Show
        Abdul Chaudhry added a comment - That's fine. Support for stored procedures would be great. I am not sure what code changes would be involved in the DIH - I guess you would need a CDATA section in the XML data-config file - we are currently using MYSQL 5.0.something
        Hide
        Noble Paul added a comment - - edited

        The syntax of a callable statement is distinct from that of a select . So we can just use that aspect to figure out if it is a callable statement

        we can support callable statements with a few restrictions

        • Only IN args support
        • no return types
        • probably no multiple resultsets too
        query="{call getResult( ${x.somevar1} ,${y.somevar2})}"
        

        Before invoking the callable statements the placeholder locations will be replaced with '?'

        as

        {call getResult( ?,?)}
        

        and setxxx() can be called with the values of $

        {x.somevar1}

        and $

        {y.somevar2}
        Show
        Noble Paul added a comment - - edited The syntax of a callable statement is distinct from that of a select . So we can just use that aspect to figure out if it is a callable statement we can support callable statements with a few restrictions Only IN args support no return types probably no multiple resultsets too query= "{call getResult( ${x.somevar1} ,${y.somevar2})}" Before invoking the callable statements the placeholder locations will be replaced with '?' as {call getResult( ?,?)} and setxxx() can be called with the values of $ {x.somevar1} and $ {y.somevar2}
        Hide
        Hoss Man added a comment -

        Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

        http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

        Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

        A unique token for finding these 240 issues in the future: hossversioncleanup20100527

        Show
        Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
        Hide
        Chris Book added a comment -

        I think the original comment about prepared statements is more valid. For a one-to-many, dataimporthandler is generating the same query many times with just a different key param. A prepared statement allows the db server (ie mysql) to keep the query alive and just return different results per key each time. You save on the overhead of the db parsing the query and other setup.

        I agree that callable statements move the logic into sql which is not ideal. Using prepared statements by default should require no change to the xml schema.

        Show
        Chris Book added a comment - I think the original comment about prepared statements is more valid. For a one-to-many, dataimporthandler is generating the same query many times with just a different key param. A prepared statement allows the db server (ie mysql) to keep the query alive and just return different results per key each time. You save on the overhead of the db parsing the query and other setup. I agree that callable statements move the logic into sql which is not ideal. Using prepared statements by default should require no change to the xml schema.
        Hide
        Joachim Martin added a comment -

        I have been told by my Oracle DBA that DIH's use of Statement vs Prepared Statement is causing serious problems on the database side. There is a performance gain by not having to re-parse a prepared statement, but more importantly, each Statement that needs to be re-parsed takes up space in the cache. If you have repeating related entities (e.g. Author->Books[]), each related query is a unique statement.

        Many developers, myself included, would never consider writing a database app without Prepared Statements for performance reasons. I think it's even more important in a batch update situation where you are running N additional related entity queries.

        I like the syntax of MyBatis' mapped statements:

        select field1, field2 from related_table where entity_id = #

        {id, jdbcType=NUMERIC}
        Show
        Joachim Martin added a comment - I have been told by my Oracle DBA that DIH's use of Statement vs Prepared Statement is causing serious problems on the database side. There is a performance gain by not having to re-parse a prepared statement, but more importantly, each Statement that needs to be re-parsed takes up space in the cache. If you have repeating related entities (e.g. Author->Books[]), each related query is a unique statement. Many developers, myself included, would never consider writing a database app without Prepared Statements for performance reasons. I think it's even more important in a batch update situation where you are running N additional related entity queries. I like the syntax of MyBatis' mapped statements: select field1, field2 from related_table where entity_id = # {id, jdbcType=NUMERIC}
        Hide
        Robert Muir added a comment -

        Bulk move 3.2 -> 3.3

        Show
        Robert Muir added a comment - Bulk move 3.2 -> 3.3
        Hide
        Robert Muir added a comment -

        3.4 -> 3.5

        Show
        Robert Muir added a comment - 3.4 -> 3.5
        Hide
        Hoss Man added a comment -

        bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

        Show
        Hoss Man added a comment - bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment
        Hide
        Robert Muir added a comment -

        rmuir20120906-bulk-40-change

        Show
        Robert Muir added a comment - rmuir20120906-bulk-40-change
        Hide
        Hoss Man added a comment -

        There is no indication that anyone is actively working on this issue, and it has no current patch, so removing 4.0 from the fixVersion.

        Show
        Hoss Man added a comment - There is no indication that anyone is actively working on this issue, and it has no current patch, so removing 4.0 from the fixVersion.
        Hide
        Erick Erickson added a comment -

        2013 Old JIRA cleanup

        Show
        Erick Erickson added a comment - 2013 Old JIRA cleanup

          People

          • Assignee:
            Noble Paul
            Reporter:
            Abdul Chaudhry
          • Votes:
            8 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:

              Development