Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I've been reviewing the ideas for updatable fields and have an alternative
      proposal that I think would address my biggest concern:

      • not slowing down searching

      When I look at what Solr and Elasticsearch do here, by basically reindexing from stored fields, I think they solve a lot of the problem: users don't have to "rebuild" their document from scratch just to update one tiny piece.

      But I think we can do this more efficiently: by avoiding reindexing of the unaffected fields.

      The basic idea is that we would require term vectors for this approach (as the already store a serialized indexed version of the doc), and so we could just take the other pieces from the existing vectors for the doc.

      I dont think we should discard the idea because vectors are slow/big today, this seems like something we could fix.

      Personally I like the idea of not slowing down search performance to solve the problem, I think we should really start from that angle and work towards making the indexing side more efficient, not vice-versa.

        Issue Links

          Activity

          Gavin made changes -
          Link This issue depends upon LUCENE-1888 [ LUCENE-1888 ]
          Gavin made changes -
          Link This issue depends on LUCENE-1888 [ LUCENE-1888 ]
          Adrien Grand made changes -
          Link This issue relates to LUCENE-4599 [ LUCENE-4599 ]
          Robert Muir made changes -
          Description I've been reviewing the ideas for updatable fields and have an alternative
          proposal that I think would address my biggest concern:

          * not slowing down searching

          When I look at what Solr and Elasticsearch do here, by basically reindexing from stored fields, I think they solve a lot of the problem: users don't have to "rebuild" their document from scratch just to update one tiny piece.

          But I think we can do this more efficiently: by avoiding reindexing of the unaffected fields.

          The basic idea is that we would require term vectors for this approach (as the already store a serialized indexed version of the doc), and so we could just take the other pieces from the existing vectors for the doc.

          I think we would have to extend vectors to also store the norm (so we dont recompute that), and payloads, but it seems feasible at a glance.

          I dont think we should discard the idea because vectors are slow/big today, this seems like something we could fix.

          Personally I like the idea of not slowing down search performance to solve the problem, I think we should really start from that angle and work towards making the indexing side more efficient, not vice-versa.
          I've been reviewing the ideas for updatable fields and have an alternative
          proposal that I think would address my biggest concern:

          * not slowing down searching

          When I look at what Solr and Elasticsearch do here, by basically reindexing from stored fields, I think they solve a lot of the problem: users don't have to "rebuild" their document from scratch just to update one tiny piece.

          But I think we can do this more efficiently: by avoiding reindexing of the unaffected fields.

          The basic idea is that we would require term vectors for this approach (as the already store a serialized indexed version of the doc), and so we could just take the other pieces from the existing vectors for the doc.

          I dont think we should discard the idea because vectors are slow/big today, this seems like something we could fix.

          Personally I like the idea of not slowing down search performance to solve the problem, I think we should really start from that angle and work towards making the indexing side more efficient, not vice-versa.
          Michael McCandless made changes -
          Link This issue is related to LUCENE-4258 [ LUCENE-4258 ]
          Robert Muir made changes -
          Field Original Value New Value
          Link This issue depends on LUCENE-1888 [ LUCENE-1888 ]
          Robert Muir created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              Robert Muir
            • Votes:
              4 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:

                Development