Solr
  1. Solr
  2. SOLR-828

A RequestProcessor to support updates

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted and we can easily focus on that solution.

        Issue Links

          Activity

          Noble Paul created issue -
          Noble Paul made changes -
          Field Original Value New Value
          Link This issue is related to SOLR-139 [ SOLR-139 ]
          Noble Paul made changes -
          Description This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted and we can easily focus on that solution.


          The new {{UpdateProcessor}} called ({{UpdateableIndexProcessor}}) must be inserted before {{RunUpdateProcessor}}.

          * The {{UpdateProcessor}} must add an update method.
          * the {{AddUpdateCommand}} has a new boolean field append. If append= true multivalued fields will be appended else old ones are removed and new ones are added
          * The schema must have a {{<uniquekeyField>}}
          * {{UpdateableIndexProcessor}} registers {{postCommit/postOptimize}} listeners.

          h1.Implementation
          {{UpdateableIndexProcessor}} maintains two separate Lucene indexes for doing the backup
           * *temp.backup.index* : This index stores (not indexed) all the fields (except uniquekey which is stored and indexed) in the document
           * *backup.index* : This index stores (not indexed) all the fields (except uniquekey which is stored and indexed) which are not stored in the actual schema and the fields which are targets of copyField.
          h1.Implementation of various methods

          h2.{{processAdd()}}
          {{UpdateableIndexProcessor}} writes the document to temp.backup.index . And calls next {{UpdateProcessor}}

          h2.{{processDelete()}}
          {{UpdateableIndexProcessor}} gets the Searcher from a core query and find the documents which matches the query and delete from *backup.index* . if it is a delete by id delete the document with that id from *temp.backup.index* . call next {{UpdateProcessor}}

          h2.{{processCommit()}}
          {{UpdateableIndexProcessor}} calls next {{UpdateProcessor}}

          h2.on {{postCommit/postOmptize}}
          {{UpdateableIndexProcessor}} commits the *temp.backup.index* . Gets all the documents from the *temp.backup.index* one by one . if the document is present in the main index it is copied to *backup.index* .Finally it commits the *backup.index*. *temp.backup.index* is detryed after that

          h2.{{processUpdate()}}
          {{UpdateableIndexProcessor}} commits the *temp.backup.index* . Check the document first in *temp.backup.index* . If it is present read the document . if it is not present , check in *backup.index* .If it is present there , get the searcher from the main index and read all the missing fields from there, and the backup document is prepared

          The single valued fields are used from the incoming document (if present) others are fillled from backup doc . If append=true all the multivalues values from backup document are added to the incoming document else the values from backup document is not used if they are present in incoming document also.

          h2. new {{BackupIndexRequestHandler}} registered automatically at {{/backup}}
          This exposes the data present in the backp indexes. The user must be able to get any document by id by invoking {{/backup?id=<value>}} (multiple id values can be sent eg:id=1&id=2&id=4). This helps the user to query the backup index and construct the new doc if he wishes to do so. The {{BackupIndexRequestHandler}} does a commit on *temp.backup.index* and searches the *temp.backup.index* first for the id and if the document is absent then it checks in the *backup.index* and returns the document.
          This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted and we can easily focus on that solution.


          The new {{UpdateProcessor}} called ({{UpdateableIndexProcessor}}) must be inserted before {{RunUpdateProcessor}}.

          * The {{UpdateProcessor}} must add an update method.
          * the {{AddUpdateCommand}} has a new boolean field append. If append= true multivalued fields will be appended else old ones are removed and new ones are added
          * The schema must have a {{<uniquekeyField>}}
          * {{UpdateableIndexProcessor}} registers {{postCommit/postOptimize}} listeners.

          h1.Implementation
          {{UpdateableIndexProcessor}} maintains two separate Lucene indexes for doing the backup
           * *temp.backup.index* : This index stores (not indexed) all the fields (except uniquekey which is stored and indexed) in the document
           * *backup.index* : This index stores (not indexed) all the fields (except uniquekey which is stored and indexed) which are not stored in the actual schema and the fields which are targets of copyField.
          h1.Implementation of various methods

          h2.{{processAdd()}}
          {{UpdateableIndexProcessor}} writes the document to *temp.backup.index* . Call next {{UpdateProcessor}}

          h2.{{processDelete()}}
          {{UpdateableIndexProcessor}} gets the Searcher from a core query and find the documents which matches the query and delete from *backup.index* . if it is a delete by id delete the document with that id from *temp.backup.index* . Call next {{UpdateProcessor}}

          h2.{{processCommit()}}
          Call next {{UpdateProcessor}}

          h2.on {{postCommit/postOmptize}}
          {{UpdateableIndexProcessor}} commits the *temp.backup.index* . Gets all the documents from the *temp.backup.index* one by one . If the document is present in the main index it is copied to *backup.index* , else it is thrown away because a deletebyquery would have deleted it .Finally it commits the *backup.index*. *temp.backup.index* is destroyed after that

          h2.{{processUpdate()}}
          {{UpdateableIndexProcessor}} commits the *temp.backup.index* . Check the document first in *temp.backup.index* . If it is present read the document . if it is not present , check in *backup.index* .If it is present there , get the searcher from the main index and read all the missing fields from there, and the backup document is prepared

          The single valued fields are used from the incoming document (if present) others are fillled from backup doc . If append=true all the multivalues values from backup document are added to the incoming document else the values from backup document is not used if they are present in incoming document also.

          h2. new {{BackupIndexRequestHandler}} registered automatically at {{/backup}}
          This exposes the data present in the backp indexes. The user must be able to get any document by id by invoking {{/backup?id=<value>}} (multiple id values can be sent eg:id=1&id=2&id=4). This helps the user to query the backup index and construct the new doc if he wishes to do so. The {{BackupIndexRequestHandler}} does a commit on *temp.backup.index* and searches the *temp.backup.index* first for the id and if the document is absent then it checks in the *backup.index* and returns the document.


          Noble Paul made changes -
          Description This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted and we can easily focus on that solution.


          The new {{UpdateProcessor}} called ({{UpdateableIndexProcessor}}) must be inserted before {{RunUpdateProcessor}}.

          * The {{UpdateProcessor}} must add an update method.
          * the {{AddUpdateCommand}} has a new boolean field append. If append= true multivalued fields will be appended else old ones are removed and new ones are added
          * The schema must have a {{<uniquekeyField>}}
          * {{UpdateableIndexProcessor}} registers {{postCommit/postOptimize}} listeners.

          h1.Implementation
          {{UpdateableIndexProcessor}} maintains two separate Lucene indexes for doing the backup
           * *temp.backup.index* : This index stores (not indexed) all the fields (except uniquekey which is stored and indexed) in the document
           * *backup.index* : This index stores (not indexed) all the fields (except uniquekey which is stored and indexed) which are not stored in the actual schema and the fields which are targets of copyField.
          h1.Implementation of various methods

          h2.{{processAdd()}}
          {{UpdateableIndexProcessor}} writes the document to *temp.backup.index* . Call next {{UpdateProcessor}}

          h2.{{processDelete()}}
          {{UpdateableIndexProcessor}} gets the Searcher from a core query and find the documents which matches the query and delete from *backup.index* . if it is a delete by id delete the document with that id from *temp.backup.index* . Call next {{UpdateProcessor}}

          h2.{{processCommit()}}
          Call next {{UpdateProcessor}}

          h2.on {{postCommit/postOmptize}}
          {{UpdateableIndexProcessor}} commits the *temp.backup.index* . Gets all the documents from the *temp.backup.index* one by one . If the document is present in the main index it is copied to *backup.index* , else it is thrown away because a deletebyquery would have deleted it .Finally it commits the *backup.index*. *temp.backup.index* is destroyed after that

          h2.{{processUpdate()}}
          {{UpdateableIndexProcessor}} commits the *temp.backup.index* . Check the document first in *temp.backup.index* . If it is present read the document . if it is not present , check in *backup.index* .If it is present there , get the searcher from the main index and read all the missing fields from there, and the backup document is prepared

          The single valued fields are used from the incoming document (if present) others are fillled from backup doc . If append=true all the multivalues values from backup document are added to the incoming document else the values from backup document is not used if they are present in incoming document also.

          h2. new {{BackupIndexRequestHandler}} registered automatically at {{/backup}}
          This exposes the data present in the backp indexes. The user must be able to get any document by id by invoking {{/backup?id=<value>}} (multiple id values can be sent eg:id=1&id=2&id=4). This helps the user to query the backup index and construct the new doc if he wishes to do so. The {{BackupIndexRequestHandler}} does a commit on *temp.backup.index* and searches the *temp.backup.index* first for the id and if the document is absent then it checks in the *backup.index* and returns the document.


          This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted and we can easily focus on that solution.


          The new {{UpdateProcessor}} called ({{UpdateableIndexProcessor}}) must be inserted before {{RunUpdateProcessor}}.

          * The {{UpdateProcessor}} must add an update method.
          * the {{AddUpdateCommand}} has a new boolean field append. If append= true multivalued fields will be appended else old ones are removed and new ones are added
          * The schema must have a {{<uniqueKey>}}
          * {{UpdateableIndexProcessor}} registers {{postCommit/postOptimize}} listeners.

          h1.Implementation
          {{UpdateableIndexProcessor}} maintains two separate Lucene indexes for doing the backup
           * *temp.backup.index* : This index stores (not indexed) all the fields (except uniquekey which is stored and indexed) in the document
           * *backup.index* : This index stores (not indexed) all the fields (except uniquekey which is stored and indexed) which are not stored in the main index and the fields which are targets of copyField.
          h1.Implementation of various methods

          h2.{{processAdd()}}
          {{UpdateableIndexProcessor}} writes the document to *temp.backup.index* . Call next {{UpdateProcessor}}

          h2.{{processDelete()}}
          {{UpdateableIndexProcessor}} gets the Searcher from a core query and find the documents which matches the query and delete from *backup.index* . if it is a delete by id delete the document with that id from *temp.backup.index* . Call next {{UpdateProcessor}}

          h2.{{processCommit()}}
          Call next {{UpdateProcessor}}

          h2.on {{postCommit/postOmptize}}
          {{UpdateableIndexProcessor}} commits the *temp.backup.index* . Gets all the documents from the *temp.backup.index* one by one . If the document is present in the main index it is copied to *backup.index* , else it is thrown away because a deletebyquery would have deleted it .Finally it commits the *backup.index*. *temp.backup.index* is destroyed after that. A new *temp.backup.index* is recreated when new documents are added

          h2.{{processUpdate()}}
          {{UpdateableIndexProcessor}} commits the *temp.backup.index* . Check the document first in *temp.backup.index* . If it is present read the document . If it is not present , check in *backup.index* .If it is present there , get the searcher from the main index and read all the missing fields from there, and the backup document is prepared

          The single valued fields are used from the incoming document (if present) others are fillled from backup doc . If append=true all the multivalues values from backup document are added to the incoming document else the values from backup document is not used if they are present in incoming document also.

          h2. new {{BackupIndexRequestHandler}} registered automatically at {{/backup}}
          This exposes the data present in the backup indexes. The user must be able to get any document by id by invoking {{/backup?id=<value>}} (multiple id values can be sent eg:id=1&id=2&id=4). This helps the user to query the backup index and construct the new doc if he wishes to do so. The {{BackupIndexRequestHandler}} does a commit on *temp.backup.index* .It first searches the *temp.backup.index* with the id .If the document is not found, then it searches the *backup.index* . If it finds the document(s) it is returned


          Noble Paul made changes -
          Description This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted and we can easily focus on that solution.


          The new {{UpdateProcessor}} called ({{UpdateableIndexProcessor}}) must be inserted before {{RunUpdateProcessor}}.

          * The {{UpdateProcessor}} must add an update method.
          * the {{AddUpdateCommand}} has a new boolean field append. If append= true multivalued fields will be appended else old ones are removed and new ones are added
          * The schema must have a {{<uniqueKey>}}
          * {{UpdateableIndexProcessor}} registers {{postCommit/postOptimize}} listeners.

          h1.Implementation
          {{UpdateableIndexProcessor}} maintains two separate Lucene indexes for doing the backup
           * *temp.backup.index* : This index stores (not indexed) all the fields (except uniquekey which is stored and indexed) in the document
           * *backup.index* : This index stores (not indexed) all the fields (except uniquekey which is stored and indexed) which are not stored in the main index and the fields which are targets of copyField.
          h1.Implementation of various methods

          h2.{{processAdd()}}
          {{UpdateableIndexProcessor}} writes the document to *temp.backup.index* . Call next {{UpdateProcessor}}

          h2.{{processDelete()}}
          {{UpdateableIndexProcessor}} gets the Searcher from a core query and find the documents which matches the query and delete from *backup.index* . if it is a delete by id delete the document with that id from *temp.backup.index* . Call next {{UpdateProcessor}}

          h2.{{processCommit()}}
          Call next {{UpdateProcessor}}

          h2.on {{postCommit/postOmptize}}
          {{UpdateableIndexProcessor}} commits the *temp.backup.index* . Gets all the documents from the *temp.backup.index* one by one . If the document is present in the main index it is copied to *backup.index* , else it is thrown away because a deletebyquery would have deleted it .Finally it commits the *backup.index*. *temp.backup.index* is destroyed after that. A new *temp.backup.index* is recreated when new documents are added

          h2.{{processUpdate()}}
          {{UpdateableIndexProcessor}} commits the *temp.backup.index* . Check the document first in *temp.backup.index* . If it is present read the document . If it is not present , check in *backup.index* .If it is present there , get the searcher from the main index and read all the missing fields from there, and the backup document is prepared

          The single valued fields are used from the incoming document (if present) others are fillled from backup doc . If append=true all the multivalues values from backup document are added to the incoming document else the values from backup document is not used if they are present in incoming document also.

          h2. new {{BackupIndexRequestHandler}} registered automatically at {{/backup}}
          This exposes the data present in the backup indexes. The user must be able to get any document by id by invoking {{/backup?id=<value>}} (multiple id values can be sent eg:id=1&id=2&id=4). This helps the user to query the backup index and construct the new doc if he wishes to do so. The {{BackupIndexRequestHandler}} does a commit on *temp.backup.index* .It first searches the *temp.backup.index* with the id .If the document is not found, then it searches the *backup.index* . If it finds the document(s) it is returned


          This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted and we can easily focus on that solution.


          The new {{UpdateProcessor}} called ({{UpdateableIndexProcessor}}) must be inserted before {{RunUpdateProcessor}}.

          * The {{UpdateProcessor}} must add an update method.
          * the {{AddUpdateCommand}} has a new boolean field append. If append= true multivalued fields will be appended else old ones are removed and new ones are added
          * The schema must have a {{<uniqueKey>}}
          * {{UpdateableIndexProcessor}} registers {{postCommit/postOptimize}} listeners.

          h1.Implementation
          {{UpdateableIndexProcessor}} maintains two separate Lucene indexes for doing the backup
           * *temp.backup.index* : This index stores (not indexed) all the fields (except uniquekey which is stored and indexed) in the document
           * *backup.index* : This index stores (not indexed) all the fields (except uniquekey which is stored and indexed) which are not stored in the main index and the fields which are targets of copyField.
          h1.Implementation of various methods

          h2.{{processAdd()}}
          {{UpdateableIndexProcessor}} writes the document to *temp.backup.index* . Call next {{UpdateProcessor}}

          h2.{{processDelete()}}
          {{UpdateableIndexProcessor}} gets the Searcher from a core query and find the documents which matches the query and delete from *backup.index* . if it is a delete by id delete the document with that id from *temp.backup.index* . Call next {{UpdateProcessor}}

          h2.{{processCommit()}}
          Call next {{UpdateProcessor}}

          h2.on {{postCommit/postOmptimize}}
          {{UpdateableIndexProcessor}} commits the *temp.backup.index* . Gets all the documents from the *temp.backup.index* one by one . If the document is present in the main index it is copied to *backup.index* , else it is thrown away because a deletebyquery would have deleted it .Finally it commits the *backup.index*. *temp.backup.index* is destroyed after that. A new *temp.backup.index* is recreated when new documents are added

          h2.{{processUpdate()}}
          {{UpdateableIndexProcessor}} commits the *temp.backup.index* . Check the document first in *temp.backup.index* . If it is present read the document . If it is not present , check in *backup.index* .If it is present there , get the searcher from the main index and read all the missing fields from there, and the backup document is prepared

          The single valued fields are used from the incoming document (if present) others are fillled from backup doc . If append=true all the multivalues values from backup document are added to the incoming document else the values from backup document is not used if they are present in incoming document also.

          h2. new {{BackupIndexRequestHandler}} registered automatically at {{/backup}}
          This exposes the data present in the backup indexes. The user must be able to get any document by id by invoking {{/backup?id=<value>}} (multiple id values can be sent eg:id=1&id=2&id=4). This helps the user to query the backup index and construct the new doc if he wishes to do so. The {{BackupIndexRequestHandler}} does a commit on *temp.backup.index* .It first searches the *temp.backup.index* with the id .If the document is not found, then it searches the *backup.index* . If it finds the document(s) it is returned


          Hide
          Noble Paul added a comment -

          The old approach is more work compared to the DB approach. It was not good for very fast updates/commits

          Show
          Noble Paul added a comment - The old approach is more work compared to the DB approach. It was not good for very fast updates/commits
          Noble Paul made changes -
          Issue Type Improvement [ 4 ] New Feature [ 2 ]
          Description This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted and we can easily focus on that solution.


          The new {{UpdateProcessor}} called ({{UpdateableIndexProcessor}}) must be inserted before {{RunUpdateProcessor}}.

          * The {{UpdateProcessor}} must add an update method.
          * the {{AddUpdateCommand}} has a new boolean field append. If append= true multivalued fields will be appended else old ones are removed and new ones are added
          * The schema must have a {{<uniqueKey>}}
          * {{UpdateableIndexProcessor}} registers {{postCommit/postOptimize}} listeners.

          h1.Implementation
          {{UpdateableIndexProcessor}} maintains two separate Lucene indexes for doing the backup
           * *temp.backup.index* : This index stores (not indexed) all the fields (except uniquekey which is stored and indexed) in the document
           * *backup.index* : This index stores (not indexed) all the fields (except uniquekey which is stored and indexed) which are not stored in the main index and the fields which are targets of copyField.
          h1.Implementation of various methods

          h2.{{processAdd()}}
          {{UpdateableIndexProcessor}} writes the document to *temp.backup.index* . Call next {{UpdateProcessor}}

          h2.{{processDelete()}}
          {{UpdateableIndexProcessor}} gets the Searcher from a core query and find the documents which matches the query and delete from *backup.index* . if it is a delete by id delete the document with that id from *temp.backup.index* . Call next {{UpdateProcessor}}

          h2.{{processCommit()}}
          Call next {{UpdateProcessor}}

          h2.on {{postCommit/postOmptimize}}
          {{UpdateableIndexProcessor}} commits the *temp.backup.index* . Gets all the documents from the *temp.backup.index* one by one . If the document is present in the main index it is copied to *backup.index* , else it is thrown away because a deletebyquery would have deleted it .Finally it commits the *backup.index*. *temp.backup.index* is destroyed after that. A new *temp.backup.index* is recreated when new documents are added

          h2.{{processUpdate()}}
          {{UpdateableIndexProcessor}} commits the *temp.backup.index* . Check the document first in *temp.backup.index* . If it is present read the document . If it is not present , check in *backup.index* .If it is present there , get the searcher from the main index and read all the missing fields from there, and the backup document is prepared

          The single valued fields are used from the incoming document (if present) others are fillled from backup doc . If append=true all the multivalues values from backup document are added to the incoming document else the values from backup document is not used if they are present in incoming document also.

          h2. new {{BackupIndexRequestHandler}} registered automatically at {{/backup}}
          This exposes the data present in the backup indexes. The user must be able to get any document by id by invoking {{/backup?id=<value>}} (multiple id values can be sent eg:id=1&id=2&id=4). This helps the user to query the backup index and construct the new doc if he wishes to do so. The {{BackupIndexRequestHandler}} does a commit on *temp.backup.index* .It first searches the *temp.backup.index* with the id .If the document is not found, then it searches the *backup.index* . If it finds the document(s) it is returned


          This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted and we can easily focus on that solution.


          The new {{UpdateProcessor}} called ({{UpdateableIndexProcessor}}) must be inserted before {{RunUpdateProcessor}}.

          * The {{UpdateProcessor}} must add an update method.
          * the {{AddUpdateCommand}} has a new boolean field append. If append= true multivalued fields will be appended else old ones are removed and new ones are added
          * The schema must have a {{<uniqueKey>}}
          * {{UpdateableIndexProcessor}} registers {{postCommit/postOptimize}} listeners.

          h1.Implementation
          {{UpdateableIndexProcessor}} uses a DB (JDBC / Berkley DB java?) to store the data. Each document will be a row in the DB . The uniqueKey of the document will be used as the primary key. The data will be written as a BLOB into a DB column . The format will be NamedListCodec serialized format. The NamedListCodec in the current form is inefficient but it is possible to enhance it (SOLR-810)

          The schema of the table would be
          DATA : LONGVARBINARY : A NamedListCodec Serialized data
          COMMITTED:BOOL
          BOOST:DOUBLE
          FIELD_BOOSTS:VARBINARY A NamedListCodec serialized boosts of each fields

          h1.Implementation of various methods

          h2.{{processAdd()}}
          {{UpdateableIndexProcessor}} writes the serialized document to the DB (COMMITTED=false) . Call next {{UpdateProcessor#add()}}

          h2.{{processDelete()}}
          {{UpdateableIndexProcessor}} gets the Searcher from a core query and find the documents which matches the query and delete from the data table . If it is a delete by id delete the document with that id from data table. Call next {{UpdateProcessor}}

          h2.{{processCommit()}}
          Call next {{UpdateProcessor}}

          h2.on {{postCommit/postOmptimize}}
          {{UpdateableIndexProcessor}} gets all the documents from the data table which is committed =false. If the document is present in the main index it is marked as COMMITTED=true, else it is deleted because a deletebyquery would have deleted it .

          h2.{{processUpdate()}}
          {{UpdateableIndexProcessor}} check the document first in data table. If it is present read the document . If it is not present , read all the missing fields from there, and the backup document is prepared

          The single valued fields are used from the incoming document (if present) others are fillled from backup doc . If append=true all the multivalues values from backup document are added to the incoming document else the values from backup document is not used if they are present in incoming document also.

          {{processAdd()}} is called on the next {{UpdateProcessor}}

          h2. new {{BackupIndexRequestHandler}} registered automatically at {{/backup}}
          This exposes the data present in the backup indexes. The user must be able to get any document by id by invoking {{/backup?id=<value>}} (multiple id values can be sent eg:id=1&id=2&id=4). This helps the user to query the backup index and construct the new doc if he wishes to do so. The {{BackupIndexRequestHandler}} does a commit on *temp.backup.index* .It first searches the *temp.backup.index* with the id .If the document is not found, then it searches the *backup.index* . If it finds the document(s) it is returned

          h2.Next steps
          The datastore can be optimized by not storing the stored fields in the DB. That can be another iteration

          Noble Paul made changes -
          Link This issue is blocked by SOLR-810 [ SOLR-810 ]
          Noble Paul made changes -
          Description This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted and we can easily focus on that solution.


          The new {{UpdateProcessor}} called ({{UpdateableIndexProcessor}}) must be inserted before {{RunUpdateProcessor}}.

          * The {{UpdateProcessor}} must add an update method.
          * the {{AddUpdateCommand}} has a new boolean field append. If append= true multivalued fields will be appended else old ones are removed and new ones are added
          * The schema must have a {{<uniqueKey>}}
          * {{UpdateableIndexProcessor}} registers {{postCommit/postOptimize}} listeners.

          h1.Implementation
          {{UpdateableIndexProcessor}} uses a DB (JDBC / Berkley DB java?) to store the data. Each document will be a row in the DB . The uniqueKey of the document will be used as the primary key. The data will be written as a BLOB into a DB column . The format will be NamedListCodec serialized format. The NamedListCodec in the current form is inefficient but it is possible to enhance it (SOLR-810)

          The schema of the table would be
          DATA : LONGVARBINARY : A NamedListCodec Serialized data
          COMMITTED:BOOL
          BOOST:DOUBLE
          FIELD_BOOSTS:VARBINARY A NamedListCodec serialized boosts of each fields

          h1.Implementation of various methods

          h2.{{processAdd()}}
          {{UpdateableIndexProcessor}} writes the serialized document to the DB (COMMITTED=false) . Call next {{UpdateProcessor#add()}}

          h2.{{processDelete()}}
          {{UpdateableIndexProcessor}} gets the Searcher from a core query and find the documents which matches the query and delete from the data table . If it is a delete by id delete the document with that id from data table. Call next {{UpdateProcessor}}

          h2.{{processCommit()}}
          Call next {{UpdateProcessor}}

          h2.on {{postCommit/postOmptimize}}
          {{UpdateableIndexProcessor}} gets all the documents from the data table which is committed =false. If the document is present in the main index it is marked as COMMITTED=true, else it is deleted because a deletebyquery would have deleted it .

          h2.{{processUpdate()}}
          {{UpdateableIndexProcessor}} check the document first in data table. If it is present read the document . If it is not present , read all the missing fields from there, and the backup document is prepared

          The single valued fields are used from the incoming document (if present) others are fillled from backup doc . If append=true all the multivalues values from backup document are added to the incoming document else the values from backup document is not used if they are present in incoming document also.

          {{processAdd()}} is called on the next {{UpdateProcessor}}

          h2. new {{BackupIndexRequestHandler}} registered automatically at {{/backup}}
          This exposes the data present in the backup indexes. The user must be able to get any document by id by invoking {{/backup?id=<value>}} (multiple id values can be sent eg:id=1&id=2&id=4). This helps the user to query the backup index and construct the new doc if he wishes to do so. The {{BackupIndexRequestHandler}} does a commit on *temp.backup.index* .It first searches the *temp.backup.index* with the id .If the document is not found, then it searches the *backup.index* . If it finds the document(s) it is returned

          h2.Next steps
          The datastore can be optimized by not storing the stored fields in the DB. That can be another iteration

          This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted and we can easily focus on that solution.


          The new {{UpdateProcessor}} called ({{UpdateableIndexProcessor}}) must be inserted before {{RunUpdateProcessor}}.

          * The {{UpdateProcessor}} must add an update method.
          * the {{AddUpdateCommand}} has a new boolean field append. If append= true multivalued fields will be appended else old ones are removed and new ones are added
          * The schema must have a {{<uniqueKey>}}
          * {{UpdateableIndexProcessor}} registers {{postCommit/postOptimize}} listeners.

          h1.Implementation
          {{UpdateableIndexProcessor}} uses a DB (JDBC / Berkley DB java?) to store the data. Each document will be a row in the DB . The uniqueKey of the document will be used as the primary key. The data will be written as a BLOB into a DB column . The format will be {{javabin}} serialized format. The {{javabin}} format in the current form is inefficient but it is possible to enhance it (SOLR-810)

          The schema of the table would be
          DATA : LONGVARBINARY : A NamedListCodec Serialized data
          COMMITTED:BOOL
          BOOST:DOUBLE
          FIELD_BOOSTS:VARBINARY A {{javabin}} serialized data with boosts of each fields

          h1.Implementation of various methods

          h2.{{processAdd()}}
          {{UpdateableIndexProcessor}} writes the serialized document to the DB (COMMITTED=false) . Call next {{UpdateProcessor#add()}}

          h2.{{processDelete()}}
          {{UpdateableIndexProcessor}} gets the Searcher from a core query and find the documents which matches the query and delete from the data table . If it is a delete by id delete the document with that id from data table. Call next {{UpdateProcessor}}

          h2.{{processCommit()}}
          Call next {{UpdateProcessor}}

          h2.on {{postCommit/postOmptimize}}
          {{UpdateableIndexProcessor}} gets all the documents from the data table which is committed =false. If the document is present in the main index it is marked as COMMITTED=true, else it is deleted because a deletebyquery would have deleted it .

          h2.{{processUpdate()}}
          {{UpdateableIndexProcessor}} check the document first in data table. If it is present read the document . If it is not present , read all the missing fields from there, and the backup document is prepared

          The single valued fields are used from the incoming document (if present) others are filled from backup doc . If append=true all the multivalues values from backup document are added to the incoming document else the values from backup document is not used if they are present in incoming document also.

          {{processAdd()}} is called on the next {{UpdateProcessor}}

          h2. new {{BackupIndexRequestHandler}} registered automatically at {{/backup}}
          This exposes the data present in the backup indexes. The user must be able to get any document by id by invoking {{/backup?id=<value>}} (multiple id values can be sent eg:id=1&id=2&id=4). This helps the user to query the backup index and construct the new doc if he wishes to do so. The {{BackupIndexRequestHandler}} does a commit on *temp.backup.index* .It first searches the *temp.backup.index* with the id .If the document is not found, then it searches the *backup.index* . If it finds the document(s) it is returned

          h2.Next steps
          The datastore can be optimized by not storing the stored fields in the DB. This means on {{postCommit/postOptimize}} we must read back the data and remove the already stored fields and store it back. That can be another iteration

          Noble Paul made changes -
          Description This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted and we can easily focus on that solution.


          The new {{UpdateProcessor}} called ({{UpdateableIndexProcessor}}) must be inserted before {{RunUpdateProcessor}}.

          * The {{UpdateProcessor}} must add an update method.
          * the {{AddUpdateCommand}} has a new boolean field append. If append= true multivalued fields will be appended else old ones are removed and new ones are added
          * The schema must have a {{<uniqueKey>}}
          * {{UpdateableIndexProcessor}} registers {{postCommit/postOptimize}} listeners.

          h1.Implementation
          {{UpdateableIndexProcessor}} uses a DB (JDBC / Berkley DB java?) to store the data. Each document will be a row in the DB . The uniqueKey of the document will be used as the primary key. The data will be written as a BLOB into a DB column . The format will be {{javabin}} serialized format. The {{javabin}} format in the current form is inefficient but it is possible to enhance it (SOLR-810)

          The schema of the table would be
          DATA : LONGVARBINARY : A NamedListCodec Serialized data
          COMMITTED:BOOL
          BOOST:DOUBLE
          FIELD_BOOSTS:VARBINARY A {{javabin}} serialized data with boosts of each fields

          h1.Implementation of various methods

          h2.{{processAdd()}}
          {{UpdateableIndexProcessor}} writes the serialized document to the DB (COMMITTED=false) . Call next {{UpdateProcessor#add()}}

          h2.{{processDelete()}}
          {{UpdateableIndexProcessor}} gets the Searcher from a core query and find the documents which matches the query and delete from the data table . If it is a delete by id delete the document with that id from data table. Call next {{UpdateProcessor}}

          h2.{{processCommit()}}
          Call next {{UpdateProcessor}}

          h2.on {{postCommit/postOmptimize}}
          {{UpdateableIndexProcessor}} gets all the documents from the data table which is committed =false. If the document is present in the main index it is marked as COMMITTED=true, else it is deleted because a deletebyquery would have deleted it .

          h2.{{processUpdate()}}
          {{UpdateableIndexProcessor}} check the document first in data table. If it is present read the document . If it is not present , read all the missing fields from there, and the backup document is prepared

          The single valued fields are used from the incoming document (if present) others are filled from backup doc . If append=true all the multivalues values from backup document are added to the incoming document else the values from backup document is not used if they are present in incoming document also.

          {{processAdd()}} is called on the next {{UpdateProcessor}}

          h2. new {{BackupIndexRequestHandler}} registered automatically at {{/backup}}
          This exposes the data present in the backup indexes. The user must be able to get any document by id by invoking {{/backup?id=<value>}} (multiple id values can be sent eg:id=1&id=2&id=4). This helps the user to query the backup index and construct the new doc if he wishes to do so. The {{BackupIndexRequestHandler}} does a commit on *temp.backup.index* .It first searches the *temp.backup.index* with the id .If the document is not found, then it searches the *backup.index* . If it finds the document(s) it is returned

          h2.Next steps
          The datastore can be optimized by not storing the stored fields in the DB. This means on {{postCommit/postOptimize}} we must read back the data and remove the already stored fields and store it back. That can be another iteration

          This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted and we can easily focus on that solution.


          The new {{UpdateProcessor}} called ({{UpdateableIndexProcessor}}) must be inserted before {{RunUpdateProcessor}}.

          * The {{UpdateProcessor}} must add an update method.
          * the {{AddUpdateCommand}} has a new boolean field append. If append= true multivalued fields will be appended else old ones are removed and new ones are added
          * The schema must have a {{<uniqueKey>}}
          * {{UpdateableIndexProcessor}} registers {{postCommit/postOptimize}} listeners.

          h1.Implementation
          {{UpdateableIndexProcessor}} uses a DB (JDBC / Berkley DB java?) to store the data. Each document will be a row in the DB . The uniqueKey of the document will be used as the primary key. The data will be written as a BLOB into a DB column . The format will be {{javabin}} serialized format. The {{javabin}} format in the current form is inefficient but it is possible to enhance it (SOLR-810)

          The schema of the table would be
          DATA : LONGVARBINARY : A {{javabin}} Serialized SolrInputDocument
          COMMITTED:BOOL
          BOOST:DOUBLE
          FIELD_BOOSTS:VARBINARY A {{javabin}} serialized data with boosts of each fields

          h1.Implementation of various methods

          h2.{{processAdd()}}
          {{UpdateableIndexProcessor}} writes the serialized document to the DB (COMMITTED=false) . Call next {{UpdateProcessor#add()}}

          h2.{{processDelete()}}
          {{UpdateableIndexProcessor}} gets the Searcher from a core query and find the documents which matches the query and delete from the data table . If it is a delete by id delete the document with that id from data table. Call next {{UpdateProcessor}}

          h2.{{processCommit()}}
          Call next {{UpdateProcessor}}

          h2.on {{postCommit/postOmptimize}}
          {{UpdateableIndexProcessor}} gets all the documents from the data table which is committed =false. If the document is present in the main index it is marked as COMMITTED=true, else it is deleted because a deletebyquery would have deleted it .

          h2.{{processUpdate()}}
          {{UpdateableIndexProcessor}} check the document first in data table. If it is present read the document . If it is not present , read all the missing fields from there, and the backup document is prepared

          The single valued fields are used from the incoming document (if present) others are filled from backup doc . If append=true all the multivalues values from backup document are added to the incoming document else the values from backup document is not used if they are present in incoming document also.

          {{processAdd()}} is called on the next {{UpdateProcessor}}

          h2. new {{BackupIndexRequestHandler}} registered automatically at {{/backup}}
          This exposes the data present in the backup indexes. The user must be able to get any document by id by invoking {{/backup?id=<value>}} (multiple id values can be sent eg:id=1&id=2&id=4). This helps the user to query the backup index and construct the new doc if he wishes to do so. The {{BackupIndexRequestHandler}} does a commit on *temp.backup.index* .It first searches the *temp.backup.index* with the id .If the document is not found, then it searches the *backup.index* . If it finds the document(s) it is returned

          h2.Next steps
          The datastore can be optimized by not storing the stored fields in the DB. This means on {{postCommit/postOptimize}} we must read back the data and remove the already stored fields and store it back. That can be another iteration

          Noble Paul made changes -
          Description This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted and we can easily focus on that solution.


          The new {{UpdateProcessor}} called ({{UpdateableIndexProcessor}}) must be inserted before {{RunUpdateProcessor}}.

          * The {{UpdateProcessor}} must add an update method.
          * the {{AddUpdateCommand}} has a new boolean field append. If append= true multivalued fields will be appended else old ones are removed and new ones are added
          * The schema must have a {{<uniqueKey>}}
          * {{UpdateableIndexProcessor}} registers {{postCommit/postOptimize}} listeners.

          h1.Implementation
          {{UpdateableIndexProcessor}} uses a DB (JDBC / Berkley DB java?) to store the data. Each document will be a row in the DB . The uniqueKey of the document will be used as the primary key. The data will be written as a BLOB into a DB column . The format will be {{javabin}} serialized format. The {{javabin}} format in the current form is inefficient but it is possible to enhance it (SOLR-810)

          The schema of the table would be
          DATA : LONGVARBINARY : A {{javabin}} Serialized SolrInputDocument
          COMMITTED:BOOL
          BOOST:DOUBLE
          FIELD_BOOSTS:VARBINARY A {{javabin}} serialized data with boosts of each fields

          h1.Implementation of various methods

          h2.{{processAdd()}}
          {{UpdateableIndexProcessor}} writes the serialized document to the DB (COMMITTED=false) . Call next {{UpdateProcessor#add()}}

          h2.{{processDelete()}}
          {{UpdateableIndexProcessor}} gets the Searcher from a core query and find the documents which matches the query and delete from the data table . If it is a delete by id delete the document with that id from data table. Call next {{UpdateProcessor}}

          h2.{{processCommit()}}
          Call next {{UpdateProcessor}}

          h2.on {{postCommit/postOmptimize}}
          {{UpdateableIndexProcessor}} gets all the documents from the data table which is committed =false. If the document is present in the main index it is marked as COMMITTED=true, else it is deleted because a deletebyquery would have deleted it .

          h2.{{processUpdate()}}
          {{UpdateableIndexProcessor}} check the document first in data table. If it is present read the document . If it is not present , read all the missing fields from there, and the backup document is prepared

          The single valued fields are used from the incoming document (if present) others are filled from backup doc . If append=true all the multivalues values from backup document are added to the incoming document else the values from backup document is not used if they are present in incoming document also.

          {{processAdd()}} is called on the next {{UpdateProcessor}}

          h2. new {{BackupIndexRequestHandler}} registered automatically at {{/backup}}
          This exposes the data present in the backup indexes. The user must be able to get any document by id by invoking {{/backup?id=<value>}} (multiple id values can be sent eg:id=1&id=2&id=4). This helps the user to query the backup index and construct the new doc if he wishes to do so. The {{BackupIndexRequestHandler}} does a commit on *temp.backup.index* .It first searches the *temp.backup.index* with the id .If the document is not found, then it searches the *backup.index* . If it finds the document(s) it is returned

          h2.Next steps
          The datastore can be optimized by not storing the stored fields in the DB. This means on {{postCommit/postOptimize}} we must read back the data and remove the already stored fields and store it back. That can be another iteration

          This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted and we can easily focus on that solution.


          The new {{UpdateProcessor}} called ({{UpdateableIndexProcessor}}) must be inserted before {{RunUpdateProcessor}}.

          * The {{UpdateProcessor}} must add an update method.
          * the {{AddUpdateCommand}} has a new boolean field append. If append= true multivalued fields will be appended else old ones are removed and new ones are added
          * The schema must have a {{<uniqueKey>}}
          * {{UpdateableIndexProcessor}} registers {{postCommit/postOptimize}} listeners.

          h1.Implementation
          {{UpdateableIndexProcessor}} uses a DB (JDBC / Berkley DB java?) to store the data. Each document will be a row in the DB . The uniqueKey of the document will be used as the primary key. The data will be written as a BLOB into a DB column . The format will be {{javabin}} serialized format. The {{javabin}} format in the current form is inefficient but it is possible to enhance it (SOLR-810)

          The schema of the table would be
          ID : VARCHAR The primarykey of the document as string
          DATA : LONGVARBINARY : A {{javabin}} Serialized SolrInputDocument
          COMMITTED:BOOL
          BOOST:DOUBLE
          FIELD_BOOSTS:VARBINARY A {{javabin}} serialized data with boosts of each fields

          h1.Implementation of various methods

          h2.{{processAdd()}}
          {{UpdateableIndexProcessor}} writes the serialized document to the DB (COMMITTED=false) . Call next {{UpdateProcessor#add()}}

          h2.{{processDelete()}}
          {{UpdateableIndexProcessor}} gets the Searcher from a core query and find the documents which matches the query and delete from the data table . If it is a delete by id delete the document with that id from data table. Call next {{UpdateProcessor}}

          h2.{{processCommit()}}
          Call next {{UpdateProcessor}}

          h2.on {{postCommit/postOmptimize}}
          {{UpdateableIndexProcessor}} gets all the documents from the data table which is committed =false. If the document is present in the main index it is marked as COMMITTED=true, else it is deleted because a deletebyquery would have deleted it .

          h2.{{processUpdate()}}
          {{UpdateableIndexProcessor}} check the document first in data table. If it is present read the document . If it is not present , read all the missing fields from there, and the backup document is prepared

          The single valued fields are used from the incoming document (if present) others are filled from backup doc . If append=true all the multivalues values from backup document are added to the incoming document else the values from backup document is not used if they are present in incoming document also.

          {{processAdd()}} is called on the next {{UpdateProcessor}}

          h2. new {{BackupIndexRequestHandler}} registered automatically at {{/backup}}
          This exposes the data present in the backup indexes. The user must be able to get any document by id by invoking {{/backup?id=<value>}} (multiple id values can be sent eg:id=1&id=2&id=4). This helps the user to query the backup index and construct the new doc if he wishes to do so. The {{BackupIndexRequestHandler}} does a commit on *temp.backup.index* .It first searches the *temp.backup.index* with the id .If the document is not found, then it searches the *backup.index* . If it finds the document(s) it is returned

          h2.Next steps
          The datastore can be optimized by not storing the stored fields in the DB. This means on {{postCommit/postOptimize}} we must read back the data and remove the already stored fields and store it back. That can be another iteration

          Noble Paul made changes -
          Description This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted and we can easily focus on that solution.


          The new {{UpdateProcessor}} called ({{UpdateableIndexProcessor}}) must be inserted before {{RunUpdateProcessor}}.

          * The {{UpdateProcessor}} must add an update method.
          * the {{AddUpdateCommand}} has a new boolean field append. If append= true multivalued fields will be appended else old ones are removed and new ones are added
          * The schema must have a {{<uniqueKey>}}
          * {{UpdateableIndexProcessor}} registers {{postCommit/postOptimize}} listeners.

          h1.Implementation
          {{UpdateableIndexProcessor}} uses a DB (JDBC / Berkley DB java?) to store the data. Each document will be a row in the DB . The uniqueKey of the document will be used as the primary key. The data will be written as a BLOB into a DB column . The format will be {{javabin}} serialized format. The {{javabin}} format in the current form is inefficient but it is possible to enhance it (SOLR-810)

          The schema of the table would be
          ID : VARCHAR The primarykey of the document as string
          DATA : LONGVARBINARY : A {{javabin}} Serialized SolrInputDocument
          COMMITTED:BOOL
          BOOST:DOUBLE
          FIELD_BOOSTS:VARBINARY A {{javabin}} serialized data with boosts of each fields

          h1.Implementation of various methods

          h2.{{processAdd()}}
          {{UpdateableIndexProcessor}} writes the serialized document to the DB (COMMITTED=false) . Call next {{UpdateProcessor#add()}}

          h2.{{processDelete()}}
          {{UpdateableIndexProcessor}} gets the Searcher from a core query and find the documents which matches the query and delete from the data table . If it is a delete by id delete the document with that id from data table. Call next {{UpdateProcessor}}

          h2.{{processCommit()}}
          Call next {{UpdateProcessor}}

          h2.on {{postCommit/postOmptimize}}
          {{UpdateableIndexProcessor}} gets all the documents from the data table which is committed =false. If the document is present in the main index it is marked as COMMITTED=true, else it is deleted because a deletebyquery would have deleted it .

          h2.{{processUpdate()}}
          {{UpdateableIndexProcessor}} check the document first in data table. If it is present read the document . If it is not present , read all the missing fields from there, and the backup document is prepared

          The single valued fields are used from the incoming document (if present) others are filled from backup doc . If append=true all the multivalues values from backup document are added to the incoming document else the values from backup document is not used if they are present in incoming document also.

          {{processAdd()}} is called on the next {{UpdateProcessor}}

          h2. new {{BackupIndexRequestHandler}} registered automatically at {{/backup}}
          This exposes the data present in the backup indexes. The user must be able to get any document by id by invoking {{/backup?id=<value>}} (multiple id values can be sent eg:id=1&id=2&id=4). This helps the user to query the backup index and construct the new doc if he wishes to do so. The {{BackupIndexRequestHandler}} does a commit on *temp.backup.index* .It first searches the *temp.backup.index* with the id .If the document is not found, then it searches the *backup.index* . If it finds the document(s) it is returned

          h2.Next steps
          The datastore can be optimized by not storing the stored fields in the DB. This means on {{postCommit/postOptimize}} we must read back the data and remove the already stored fields and store it back. That can be another iteration

          This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted and we can easily focus on that solution.


          The new {{UpdateProcessor}} called ({{UpdateableIndexProcessor}}) must be inserted before {{RunUpdateProcessor}}.

          * The {{UpdateProcessor}} must add an update method.
          * the {{AddUpdateCommand}} has a new boolean field append. If append= true multivalued fields will be appended else old ones are removed and new ones are added
          * The schema must have a {{<uniqueKey>}}
          * {{UpdateableIndexProcessor}} registers {{postCommit/postOptimize}} listeners.

          h1.Implementation
          {{UpdateableIndexProcessor}} uses a DB (JDBC / Berkley DB java?) to store the data. Each document will be a row in the DB . The uniqueKey of the document will be used as the primary key. The data will be written as a BLOB into a DB column . The format will be {{javabin}} serialized format. The {{javabin}} format in the current form is inefficient but it is possible to enhance it (SOLR-810)

          The schema of the table would be
          ID : VARCHAR The primarykey of the document as string
          DATA : LONGVARBINARY : A {{javabin}} Serialized SolrInputDocument
          COMMITTED:BOOL
          BOOST:DOUBLE
          FIELD_BOOSTS:VARBINARY A {{javabin}} serialized data with boosts of each fields

          h1.Implementation of various methods

          h2.{{processAdd()}}
          {{UpdateableIndexProcessor}} writes the serialized document to the DB (COMMITTED=false) . Call next {{UpdateProcessor#add()}}

          h2.{{processDelete()}}
          {{UpdateableIndexProcessor}} gets the Searcher from a core query and find the documents which matches the query and delete from the data table . If it is a delete by id delete the document with that id from data table. Call next {{UpdateProcessor}}

          h2.{{processCommit()}}
          Call next {{UpdateProcessor}}

          h2.on {{postCommit/postOmptimize}}
          {{UpdateableIndexProcessor}} gets all the documents from the data table which is committed =false. If the document is present in the main index it is marked as COMMITTED=true, else it is deleted because a deletebyquery would have deleted it .

          h2.{{processUpdate()}}
          {{UpdateableIndexProcessor}} check the document first in data table. If it is present read the document . If it is not present , read all the missing fields from there, and the backup document is prepared

          The single valued fields are used from the incoming document (if present) others are filled from backup doc . If append=true all the multivalues values from backup document are added to the incoming document else the values from backup document is not used if they are present in incoming document also.

          {{processAdd()}} is called on the next {{UpdateProcessor}}

          h2. new {{BackupIndexRequestHandler}} registered automatically at {{/backup}}
          This exposes the data present in the backup indexes. The user must be able to get any document by id by invoking {{/backup?id=<value>}} (multiple id values can be sent eg:id=1&id=2&id=4). This helps the user to query the backup index and construct the new doc if he wishes to do so.

          h2.Next steps
          The datastore can be optimized by not storing the stored fields in the DB. This means on {{postCommit/postOptimize}} we must read back the data and remove the already stored fields and store it back. That can be another iteration

          Noble Paul made changes -
          Description This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted and we can easily focus on that solution.


          The new {{UpdateProcessor}} called ({{UpdateableIndexProcessor}}) must be inserted before {{RunUpdateProcessor}}.

          * The {{UpdateProcessor}} must add an update method.
          * the {{AddUpdateCommand}} has a new boolean field append. If append= true multivalued fields will be appended else old ones are removed and new ones are added
          * The schema must have a {{<uniqueKey>}}
          * {{UpdateableIndexProcessor}} registers {{postCommit/postOptimize}} listeners.

          h1.Implementation
          {{UpdateableIndexProcessor}} uses a DB (JDBC / Berkley DB java?) to store the data. Each document will be a row in the DB . The uniqueKey of the document will be used as the primary key. The data will be written as a BLOB into a DB column . The format will be {{javabin}} serialized format. The {{javabin}} format in the current form is inefficient but it is possible to enhance it (SOLR-810)

          The schema of the table would be
          ID : VARCHAR The primarykey of the document as string
          DATA : LONGVARBINARY : A {{javabin}} Serialized SolrInputDocument
          COMMITTED:BOOL
          BOOST:DOUBLE
          FIELD_BOOSTS:VARBINARY A {{javabin}} serialized data with boosts of each fields

          h1.Implementation of various methods

          h2.{{processAdd()}}
          {{UpdateableIndexProcessor}} writes the serialized document to the DB (COMMITTED=false) . Call next {{UpdateProcessor#add()}}

          h2.{{processDelete()}}
          {{UpdateableIndexProcessor}} gets the Searcher from a core query and find the documents which matches the query and delete from the data table . If it is a delete by id delete the document with that id from data table. Call next {{UpdateProcessor}}

          h2.{{processCommit()}}
          Call next {{UpdateProcessor}}

          h2.on {{postCommit/postOmptimize}}
          {{UpdateableIndexProcessor}} gets all the documents from the data table which is committed =false. If the document is present in the main index it is marked as COMMITTED=true, else it is deleted because a deletebyquery would have deleted it .

          h2.{{processUpdate()}}
          {{UpdateableIndexProcessor}} check the document first in data table. If it is present read the document . If it is not present , read all the missing fields from there, and the backup document is prepared

          The single valued fields are used from the incoming document (if present) others are filled from backup doc . If append=true all the multivalues values from backup document are added to the incoming document else the values from backup document is not used if they are present in incoming document also.

          {{processAdd()}} is called on the next {{UpdateProcessor}}

          h2. new {{BackupIndexRequestHandler}} registered automatically at {{/backup}}
          This exposes the data present in the backup indexes. The user must be able to get any document by id by invoking {{/backup?id=<value>}} (multiple id values can be sent eg:id=1&id=2&id=4). This helps the user to query the backup index and construct the new doc if he wishes to do so.

          h2.Next steps
          The datastore can be optimized by not storing the stored fields in the DB. This means on {{postCommit/postOptimize}} we must read back the data and remove the already stored fields and store it back. That can be another iteration

          This is same as SOLR-139. A new issue is opened so that the UpdateProcessor approach is highlighted and we can easily focus on that solution.

          Hide
          Noble Paul added a comment - - edited

          The new UpdateProcessor called (UpdateableIndexProcessor) must be inserted before RunUpdateProcessor.

          • The UpdateProcessor must add an update method.
          • the AddUpdateCommand has a new boolean field append. If append= true multivalued fields will be appended else old ones are removed and new ones are added
          • The schema must have a <uniqueKey>
          • UpdateableIndexProcessor registers postCommit/postOptimize listeners.

          Implementation

          UpdateableIndexProcessor uses a DB (JDBC / Berkley DB java?) to store the data. Each document will be a row in the DB . The uniqueKey of the document will be used as the primary key. The data will be written as a BLOB into a DB column . The format will be javabin serialized format. The javabin format in the current form is inefficient but it is possible to enhance it (SOLR-810)

          The schema of the table would be
          ID : VARCHAR The primarykey of the document as string
          DATA : LONGVARBINARY : A javabin Serialized SolrInputDocument
          STATUS:ENUM (COMITTED = 0,UNCOMMITTED = 1,UNCOMMITTED_MARKED_FOR_DELETE = 2,COMMITTED_MARKED_FOR_DELETE = 3)
          BOOST:DOUBLE
          FIELD_BOOSTS:VARBINARY A javabin serialized data with boosts of each fields

          Implementation of various methods

          processAdd()

          UpdateableIndexProcessor writes the serialized document to the DB (COMMITTED=false) . Call next UpdateProcessor#add()

          processDelete()

          UpdateableIndexProcessor gets the Searcher from a core query and find the documents which matches the query and delete from the data table . If it is a delete by id delete the document with that id from data table. Call next UpdateProcessor

          processCommit()

          Call next UpdateProcessor

          on postCommit/postOmptimize

          UpdateableIndexProcessor gets all the documents from the data table which is committed =false. If the document is present in the main index it is marked as COMMITTED=true, else it is deleted because a deletebyquery would have deleted it .

          processUpdate()

          UpdateableIndexProcessor check the document first in data table. If it is present read the document . If it is not present , read all the missing fields from there, and the backup document is prepared

          The single valued fields are used from the incoming document (if present) others are filled from backup doc . If append=true all the multivalues values from backup document are added to the incoming document else the values from backup document is not used if they are present in incoming document also.

          processAdd() is called on the next UpdateProcessor

          new BackupIndexRequestHandler registered automatically at /backup

          This exposes the data present in the backup indexes. The user must be able to get any document by id by invoking /backup?id=<value> (multiple id values can be sent eg:id=1&id=2&id=4). This helps the user to query the backup index and construct the new doc if he wishes to do so.

          Next steps

          The datastore can be optimized by not storing the stored fields in the DB. This means on postCommit/postOptimize we must read back the data and remove the already stored fields and store it back. That can be another iteration

          Show
          Noble Paul added a comment - - edited The new UpdateProcessor called ( UpdateableIndexProcessor ) must be inserted before RunUpdateProcessor . The UpdateProcessor must add an update method. the AddUpdateCommand has a new boolean field append. If append= true multivalued fields will be appended else old ones are removed and new ones are added The schema must have a <uniqueKey> UpdateableIndexProcessor registers postCommit/postOptimize listeners. Implementation UpdateableIndexProcessor uses a DB (JDBC / Berkley DB java?) to store the data. Each document will be a row in the DB . The uniqueKey of the document will be used as the primary key. The data will be written as a BLOB into a DB column . The format will be javabin serialized format. The javabin format in the current form is inefficient but it is possible to enhance it ( SOLR-810 ) The schema of the table would be ID : VARCHAR The primarykey of the document as string DATA : LONGVARBINARY : A javabin Serialized SolrInputDocument STATUS:ENUM (COMITTED = 0,UNCOMMITTED = 1,UNCOMMITTED_MARKED_FOR_DELETE = 2,COMMITTED_MARKED_FOR_DELETE = 3) BOOST:DOUBLE FIELD_BOOSTS:VARBINARY A javabin serialized data with boosts of each fields Implementation of various methods processAdd() UpdateableIndexProcessor writes the serialized document to the DB (COMMITTED=false) . Call next UpdateProcessor#add() processDelete() UpdateableIndexProcessor gets the Searcher from a core query and find the documents which matches the query and delete from the data table . If it is a delete by id delete the document with that id from data table. Call next UpdateProcessor processCommit() Call next UpdateProcessor on postCommit/postOmptimize UpdateableIndexProcessor gets all the documents from the data table which is committed =false. If the document is present in the main index it is marked as COMMITTED=true, else it is deleted because a deletebyquery would have deleted it . processUpdate() UpdateableIndexProcessor check the document first in data table. If it is present read the document . If it is not present , read all the missing fields from there, and the backup document is prepared The single valued fields are used from the incoming document (if present) others are filled from backup doc . If append=true all the multivalues values from backup document are added to the incoming document else the values from backup document is not used if they are present in incoming document also. processAdd() is called on the next UpdateProcessor new BackupIndexRequestHandler registered automatically at /backup This exposes the data present in the backup indexes. The user must be able to get any document by id by invoking /backup?id=<value> (multiple id values can be sent eg:id=1&id=2&id=4). This helps the user to query the backup index and construct the new doc if he wishes to do so. Next steps The datastore can be optimized by not storing the stored fields in the DB. This means on postCommit/postOptimize we must read back the data and remove the already stored fields and store it back. That can be another iteration
          Hide
          Noble Paul added a comment -

          A lot of useful comments on the mail thread

          http://markmail.org/message/57dpsbz3z6dam7q7

          Show
          Noble Paul added a comment - A lot of useful comments on the mail thread http://markmail.org/message/57dpsbz3z6dam7q7
          Hide
          Shalin Shekhar Mangar added a comment -

          Marking for 1.5

          Show
          Shalin Shekhar Mangar added a comment - Marking for 1.5
          Shalin Shekhar Mangar made changes -
          Fix Version/s 1.5 [ 12313566 ]
          Fix Version/s 1.4 [ 12313351 ]
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hoss Man made changes -
          Fix Version/s Next [ 12315093 ]
          Fix Version/s 1.5 [ 12313566 ]
          Hoss Man made changes -
          Fix Version/s 3.2 [ 12316172 ]
          Fix Version/s Next [ 12315093 ]
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Robert Muir made changes -
          Fix Version/s 3.3 [ 12316471 ]
          Fix Version/s 3.2 [ 12316172 ]
          Robert Muir made changes -
          Fix Version/s 3.3 [ 12316471 ]
          Fix Version/s 3.4 [ 12316683 ]
          Fix Version/s 4.0 [ 12314992 ]
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Robert Muir made changes -
          Fix Version/s 3.5 [ 12317876 ]
          Fix Version/s 3.4 [ 12316683 ]
          Simon Willnauer made changes -
          Fix Version/s 3.6 [ 12319065 ]
          Fix Version/s 3.5 [ 12317876 ]
          Hide
          Hoss Man added a comment -

          Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently.

          email notification suppressed to prevent mass-spam
          psuedo-unique token identifying these issues: hoss20120321nofix36

          Show
          Hoss Man added a comment - Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently. email notification suppressed to prevent mass-spam psuedo-unique token identifying these issues: hoss20120321nofix36
          Hoss Man made changes -
          Fix Version/s 3.6 [ 12319065 ]
          Robert Muir made changes -
          Fix Version/s 4.1 [ 12321141 ]
          Fix Version/s 4.0 [ 12314992 ]
          Mark Miller made changes -
          Fix Version/s 4.2 [ 12323893 ]
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.1 [ 12321141 ]
          Robert Muir made changes -
          Fix Version/s 4.3 [ 12324128 ]
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.2 [ 12323893 ]
          Uwe Schindler made changes -
          Fix Version/s 4.4 [ 12324324 ]
          Fix Version/s 4.3 [ 12324128 ]
          Hide
          Steve Rowe added a comment -

          Bulk move 4.4 issues to 4.5 and 5.0

          Show
          Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
          Steve Rowe made changes -
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.5 [ 12324743 ]
          Fix Version/s 4.4 [ 12324324 ]
          Adrien Grand made changes -
          Fix Version/s 4.6 [ 12325000 ]
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.5 [ 12324743 ]
          Uwe Schindler made changes -
          Fix Version/s 4.7 [ 12325573 ]
          Fix Version/s 4.6 [ 12325000 ]
          David Smiley made changes -
          Fix Version/s 4.8 [ 12326254 ]
          Fix Version/s 4.7 [ 12325573 ]
          Hide
          Shalin Shekhar Mangar added a comment -

          I think this is redundant now that we have atomic updates via stored fields and transaction logs.

          Show
          Shalin Shekhar Mangar added a comment - I think this is redundant now that we have atomic updates via stored fields and transaction logs.
          Shalin Shekhar Mangar made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 4.8 [ 12326254 ]
          Resolution Won't Fix [ 2 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              Noble Paul
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development