Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-6644

DataImportHandler holds on to each DB connection until the end

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      DataImportHandler with a JDBC data source opens one DB connection per entity, and then holds on to that DB connection with an open transaction after it's finished processing that entity ... right until the whole DataImportHandler operation is finished.

      So this can mean dozens of DB connections tied up for hours, unnecessarily — with each connection staying in "idle in transaction" state, holding (in PostgreSQL) an AccessShareLock on each relation it has looked at. Not ideal for production operations, of course.

      Here are the connections from Solr to the DB when a large import has been running for a while:

       backend_start | xact_start | query_start |        state        | minutes idle 
      ---------------+------------+-------------+---------------------+--------------
       20:03:20      | 20:03:20   | 20:03:21    | idle in transaction |           32
       20:03:22      | 20:03:22   | 20:03:22    | idle in transaction |           32
       20:03:22      | 20:03:22   | 20:03:22    | idle in transaction |           32
       20:03:22      | 20:03:22   | 20:03:23    | idle in transaction |           32
       20:03:21      | 20:03:21   | 20:16:35    | idle in transaction |           19
       20:03:21      | 20:03:21   | 20:16:35    | idle in transaction |           19
       20:03:22      | 20:03:22   | 20:16:35    | idle in transaction |           19
       20:03:22      | 20:03:22   | 20:16:35    | idle in transaction |           19
       20:03:22      | 20:03:22   | 20:16:35    | idle in transaction |           19
       20:16:37      | 20:16:37   | 20:16:38    | idle in transaction |           19
       20:03:21      | 20:03:21   | 20:16:35    | idle in transaction |           19
       20:03:21      | 20:03:21   | 20:16:35    | idle in transaction |           19
       20:03:21      | 20:03:21   | 20:16:35    | idle in transaction |           19
       20:16:36      | 20:16:36   | 20:16:37    | idle in transaction |           19
       20:03:20      | 20:03:20   | 20:16:35    | idle in transaction |           19
       20:16:36      | 20:16:36   | 20:35:49    | idle in transaction |            0
       20:16:36      | 20:16:36   | 20:35:49    | idle in transaction |            0
       20:16:37      | 20:16:37   | 20:35:49    | idle in transaction |            0
       20:16:35      | 20:16:35   | 20:35:41    | idle in transaction |            0
       20:16:36      | 20:16:36   | 20:35:49    | idle in transaction |            0
       20:16:37      | 20:16:37   | 20:35:49    | active              |            0
      

      Most of these haven't been touched for a long time, and will not be needed again (because DataImportHandler is done with that top-level entity). They should be released as soon as possible.

      Noticed in production in Solr 4.7.0, then reproduced in 4.10.1 (so probably also true of all versions inbetween).

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              gthb Gunnlaugur Thor Briem

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment