Solr
  1. Solr
  2. SOLR-3319

Improve DataImportHandler status response

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.5, 4.0-ALPHA
    • Fix Version/s: 4.9, Trunk
    • Labels:
      None

      Description

      The DataImportHandler has some oddities and inconsistencies in its status response that make it difficult to write code that parses DIH status, especially if both full-import and delta-import are required. See SOLR-2729.

      I would like to have a discussion where we come up with a well-defined and consistent format that can be used programatically as well as be human readable, and then I can implement it, or someone else can if they really want to. I think it would be very useful if the status response included all parameters that went into the import request, like echoParams in the query interface.

        Issue Links

          Activity

          Hide
          Shawn Heisey added a comment -

          I personally would like to see this included in 3x, since that's what I use. How do the rest of you feel about that?

          Show
          Shawn Heisey added a comment - I personally would like to see this included in 3x, since that's what I use. How do the rest of you feel about that?
          Hide
          James Dyer added a comment -

          I don't think this can be done for 3.x as the branch is in bug-fixes-only mode. Also, this will create backwards-incompatible changes for users' scheduling programs, so this kind of thing is better suited for a new major release.

          Show
          James Dyer added a comment - I don't think this can be done for 3.x as the branch is in bug-fixes-only mode. Also, this will create backwards-incompatible changes for users' scheduling programs, so this kind of thing is better suited for a new major release.
          Hide
          Shawn Heisey added a comment -

          Here's an idea, at least for 3x, assuming it's not unilaterally killed by the bug-fix-only mode: A configuration knob to use the old response or the new response. It would default to old.

          For 4.0, that configuration knob seems like a good idea, defaulting to the new response. In 4.1 or 5.0, the old response gets removed.

          Show
          Shawn Heisey added a comment - Here's an idea, at least for 3x, assuming it's not unilaterally killed by the bug-fix-only mode: A configuration knob to use the old response or the new response. It would default to old. For 4.0, that configuration knob seems like a good idea, defaulting to the new response. In 4.1 or 5.0, the old response gets removed.
          Hide
          Shawn Heisey added a comment -

          Here are some general ideas, preliminary because I have not taken a close look at the code yet. For reference, here is a completed status response on a full-import from 3.5.0:

          <?xml version="1.0" encoding="UTF-8"?>
          <response>
          
          <lst name="responseHeader">
            <int name="status">0</int>
            <int name="QTime">0</int>
          </lst>
          <lst name="initArgs">
            <lst name="defaults">
              <str name="config">dih-config.xml</str>
            </lst>
          </lst>
          <str name="status">idle</str>
          <str name="importResponse"/>
          <lst name="statusMessages">
            <str name="Total Requests made to DataSource">1</str>
            <str name="Total Rows Fetched">11287894</str>
            <str name="Total Documents Skipped">0</str>
            <str name="Full Dump Started">2012-04-03 17:38:01</str>
            <str name="">Indexing completed. Added/Updated: 11287894 documents. Deleted 0 documents.</str>
            <str name="Committed">2012-04-03 20:16:32</str>
            <str name="Total Documents Processed">11287894</str>
            <str name="Time taken ">2:38:31.314</str>
          </lst>
          <str name="WARNING">This response format is experimental.  It is likely to change in the future.</str>
          </response>
          

          I was thinking it might be a good idea to have two response sections in addition to the echoParams section already mentioned - one for a human readable response and one for a relatively terse machine readable response. The human readable version would be fairly open to change, and could include extra verbiage so it's very understandable for a person.

          The machine readable version would have more elements, each of which is very simple, probably just a numeric value or a true/false indicator. A design decision needs to be made early - do we include all elements in every response (with the value set to zero, blank, or false), even if they don't apply to the current status? My first instinct is to include all elements, but maybe that's wrong.

          Show
          Shawn Heisey added a comment - Here are some general ideas, preliminary because I have not taken a close look at the code yet. For reference, here is a completed status response on a full-import from 3.5.0: <?xml version= "1.0" encoding= "UTF-8" ?> <response> <lst name= "responseHeader" > < int name= "status" >0</ int > < int name= "QTime" >0</ int > </lst> <lst name= "initArgs" > <lst name= "defaults" > <str name= "config" >dih-config.xml</str> </lst> </lst> <str name= "status" >idle</str> <str name= "importResponse" /> <lst name= "statusMessages" > <str name= "Total Requests made to DataSource" >1</str> <str name= "Total Rows Fetched" >11287894</str> <str name= "Total Documents Skipped" >0</str> <str name= "Full Dump Started" >2012-04-03 17:38:01</str> <str name="">Indexing completed. Added/Updated: 11287894 documents. Deleted 0 documents.</str> <str name= "Committed" >2012-04-03 20:16:32</str> <str name= "Total Documents Processed" >11287894</str> <str name= "Time taken " >2:38:31.314</str> </lst> <str name= "WARNING" >This response format is experimental. It is likely to change in the future .</str> </response> I was thinking it might be a good idea to have two response sections in addition to the echoParams section already mentioned - one for a human readable response and one for a relatively terse machine readable response. The human readable version would be fairly open to change, and could include extra verbiage so it's very understandable for a person. The machine readable version would have more elements, each of which is very simple, probably just a numeric value or a true/false indicator. A design decision needs to be made early - do we include all elements in every response (with the value set to zero, blank, or false), even if they don't apply to the current status? My first instinct is to include all elements, but maybe that's wrong.
          Hide
          Shawn Heisey added a comment -

          I have closed older issues SOLR-2728 and SOLR-2729, any work on those issues can continue in this one. SOLR-2729 has a patch attached. I haven't checked to see if this issue is a duplicate, but I would not be surprised if it is.

          This is part of an effort to close old issues that I have reported. Search tag: elyograg2013springclean

          Show
          Shawn Heisey added a comment - I have closed older issues SOLR-2728 and SOLR-2729 , any work on those issues can continue in this one. SOLR-2729 has a patch attached. I haven't checked to see if this issue is a duplicate, but I would not be surprised if it is. This is part of an effort to close old issues that I have reported. Search tag: elyograg2013springclean
          Hide
          Shawn Heisey added a comment -

          I do have some interest in working on this, but it's not currently on my radar. Implementing SOLR-4241 would illustrate the issues that need fixing ... although if this is tackled first, writing SOLR-4241 would be much easier.

          Show
          Shawn Heisey added a comment - I do have some interest in working on this, but it's not currently on my radar. Implementing SOLR-4241 would illustrate the issues that need fixing ... although if this is tackled first, writing SOLR-4241 would be much easier.
          Hide
          Steve Rowe added a comment -

          Bulk move 4.4 issues to 4.5 and 5.0

          Show
          Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
          Hide
          Uwe Schindler added a comment -

          Move issue to Solr 4.9.

          Show
          Uwe Schindler added a comment - Move issue to Solr 4.9.

            People

            • Assignee:
              Unassigned
              Reporter:
              Shawn Heisey
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development