Lucene - Core
  1. Lucene - Core
  2. LUCENE-2863

Updating a documenting looses its fields that only indexed, also NumericField tries are completely lost

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Not A Problem
    • Affects Version/s: 3.0.2, 3.0.3
    • Fix Version/s: None
    • Component/s: core/store
    • Labels:
      None
    • Environment:

      WindowsXP, Java1.6.20 using a RamDirectory

    • Lucene Fields:
      New

      Description

      I have a code snippet (see below) which creates a new document with standard (stored, indexed), not-stored, indexed-only and some NumericFields. Then it updates the document via adding a new string field. The result is that all the fields that are not stored but indexed-only and especially NumericFields the trie tokens are completly lost from index after update or delete/add.

      Directory ramDir = new RamDirectory();
      IndexWriter writer = new IndexWriter(ramDir, new WhitespaceAnalyzer(), MaxFieldLength.UNLIMITED);
      Document doc = new Document();
      doc.add(new Field("ID", "HO1234", Store.YES, Index.NOT_ANALYZED_NO_NORMS));
      doc.add(new Field("PATTERN", "HELLO", Store.NO, Index.NOT_ANALYZED_NO_NORMS));
      doc.add(new NumericField("LAT", Store.YES, true).setDoubleValue(51.488266037777066d));
      doc.add(new NumericField("LNG", Store.YES, true).setDoubleValue(-0.08913399651646614d));
      writer.addDocument(doc);
      doc = new Document();
      doc.add(new Field("ID", "HO2222", Store.YES, Index.NOT_ANALYZED_NO_NORMS));
      doc.add(new Field("PATTERN", "BELLO", Store.NO, Index.NOT_ANALYZED_NO_NORMS));
      doc.add(new NumericField("LAT", Store.YES, true).setDoubleValue(101.488266037777066d));
      doc.add(new NumericField("LNG", Store.YES, true).setDoubleValue(-100.08913399651646614d));
      writer.addDocument(doc);
      
      Term t = new Term("ID", "HO1234");
      Query q = new TermQuery(t);
      IndexSearcher seacher = new IndexSearcher(writer.getReader());
      TopDocs hits = seacher.search(q, 1);
      if (hits.scoreDocs.length > 0) {
            Document ndoc = seacher.doc(hits.scoreDocs[0].doc);
            ndoc.add(new Field("FINAL", "FINAL", Store.YES, Index.NOT_ANALYZED_NO_NORMS));
            writer.updateDocument(t, ndoc);
      //      writer.deleteDocuments(q);
      //      writer.addDocument(ndoc);
      } else {
            LOG.info("Couldn't find the document via the query");
      }
      
      seacher = new IndexSearcher(writer.getReader());
      hits = seacher.search(new TermQuery(new Term("PATTERN", "HELLO")), 1);
      LOG.info("_____hits HELLO:" + hits.totalHits); // should be 1 but it's 0
      
      writer.close();
      

      And I have a boundingbox query based on NumericRangeQuery. After the document update it doesn't return any hit.

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Closed Closed
        9h 17m 1 Shai Erera 13/Jan/11 07:29
        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12564194 ] jira [ 12585622 ]
        Mark Thomas made changes -
        Workflow jira [ 12542332 ] Default workflow, editable Closed status [ 12564194 ]
        Shai Erera made changes -
        Status Open [ 1 ] Closed [ 6 ]
        Resolution Not A Problem [ 8 ]
        Hide
        Shai Erera added a comment -

        This is not the sort of discussions we should be having in JIRA - that's why we have the user list. Closing as it's not a bug, nor a feature/enhancement proposal.

        Show
        Shai Erera added a comment - This is not the sort of discussions we should be having in JIRA - that's why we have the user list. Closing as it's not a bug, nor a feature/enhancement proposal.
        Hide
        Shai Erera added a comment -

        If you want to update documents, you should store them in their entirety somewhere (either in a Lucene index as stored fields, all of them), a DB or someplace else. This is how updateDocument currently works.

        Show
        Shai Erera added a comment - If you want to update documents, you should store them in their entirety somewhere (either in a Lucene index as stored fields, all of them), a DB or someplace else. This is how updateDocument currently works.
        Hide
        Tamas Sandor added a comment -

        Yeah, but how can I add the indexed fields back (tries of LAT, LNG and the PATTERN field)?
        document.getFields() would give my old fields back in the form on List<Fieldable> but the comment says:

        Note that fields which are not stored are not available in documents retrieved from the index, e.g. Searcher.doc(int) or IndexReader.document(int).

        So this won't work either:

        doc = searcher.doc(hits.scoreDocs[0].doc);
        Document ndoc = new Document();
        for (Fieldable field : doc.getFields()) {
            ndoc.add(field);
        }
        ndoc.add(new Field("FINAL", "FINAL", Store.YES, Index.NOT_ANALYZED_NO_NORMS));
        writer.updateDocument(t, ndoc);
        
        Show
        Tamas Sandor added a comment - Yeah, but how can I add the indexed fields back (tries of LAT , LNG and the PATTERN field)? document.getFields() would give my old fields back in the form on List<Fieldable> but the comment says: Note that fields which are not stored are not available in documents retrieved from the index, e.g. Searcher.doc(int) or IndexReader.document(int). So this won't work either: doc = searcher.doc(hits.scoreDocs[0].doc); Document ndoc = new Document(); for (Fieldable field : doc.getFields()) { ndoc.add(field); } ndoc.add( new Field( "FINAL" , "FINAL" , Store.YES, Index.NOT_ANALYZED_NO_NORMS)); writer.updateDocument(t, ndoc);
        Hide
        Earwin Burrfoot added a comment -

        updateDocument() is an atomic version of deleteDocument() + addDocument(), nothing more

        and there's nothing surprising you lose your fields if you delete the doc and don't add them back later.

        Show
        Earwin Burrfoot added a comment - updateDocument() is an atomic version of deleteDocument() + addDocument(), nothing more and there's nothing surprising you lose your fields if you delete the doc and don't add them back later.
        Tamas Sandor made changes -
        Field Original Value New Value
        Priority Major [ 3 ] Blocker [ 1 ]
        Tamas Sandor created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Tamas Sandor
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development