Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0
    • Fix Version/s: None
    • Component/s: modules/other
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      We found it useful to be able to rename a field.
      It can save a lot of reindexing time/cost when being used in conjunction with ParallelReader to update partially a field.

      1. RenameField.java
        3 kB
        John Wang
      2. RenameField.java
        2 kB
        John Wang

        Activity

        Hide
        mikemccand Michael McCandless added a comment -

        Looked at the file format wiki more closely, I see front-coding applies to all the terms in all fields. So my above comment would not work.

        Yeah I think you'll be in trouble in general if your new name doesn't "fit" in the same sort; you could check and allow a rename as long as it doesn't change the sort order?

        Do you think it makes sense to have a tii,tis file for each indexed field? Would the new codec allow for it?

        A new codec would definitely be free to store however it wanted...

        But even the standard codec (default codec for flex, most similar to the current index format) actually shouldn't mind if the fields are not in sorted order (though I haven't tested this!). It separately stores the seek position of each field... though we may need to fix the FieldsEnum to do the sorting if the index is no longer sorted.

        Show
        mikemccand Michael McCandless added a comment - Looked at the file format wiki more closely, I see front-coding applies to all the terms in all fields. So my above comment would not work. Yeah I think you'll be in trouble in general if your new name doesn't "fit" in the same sort; you could check and allow a rename as long as it doesn't change the sort order? Do you think it makes sense to have a tii,tis file for each indexed field? Would the new codec allow for it? A new codec would definitely be free to store however it wanted... But even the standard codec (default codec for flex, most similar to the current index format) actually shouldn't mind if the fields are not in sorted order (though I haven't tested this!). It separately stores the seek position of each field... though we may need to fix the FieldsEnum to do the sorting if the index is no longer sorted.
        Hide
        john.wang@gmail.com John Wang added a comment -

        Looked at the file format wiki more closely, I see front-coding applies to all the terms in all fields. So my above comment would not work.
        Do you think it makes sense to have a tii,tis file for each indexed field? Would the new codec allow for it?

        -John

        Show
        john.wang@gmail.com John Wang added a comment - Looked at the file format wiki more closely, I see front-coding applies to all the terms in all fields. So my above comment would not work. Do you think it makes sense to have a tii,tis file for each indexed field? Would the new codec allow for it? -John
        Hide
        john.wang@gmail.com John Wang added a comment -

        Did some more digging around the issue on field ordering. Is it possible to change FieldInfo file store to change the number in byNumber ArrayList along with the byName HashMap, and update the file? Or is the number already assumed to be in sort order from the tii file?

        Show
        john.wang@gmail.com John Wang added a comment - Did some more digging around the issue on field ordering. Is it possible to change FieldInfo file store to change the number in byNumber ArrayList along with the byName HashMap, and update the file? Or is the number already assumed to be in sort order from the tii file?
        Hide
        john.wang@gmail.com John Wang added a comment -

        Fixed a problem with cfs files.

        Show
        john.wang@gmail.com John Wang added a comment - Fixed a problem with cfs files.
        Hide
        john.wang@gmail.com John Wang added a comment -

        Just did a test:

        You are right, IndexReader.terms(Term) would no longer find the rename field name if the field name is out of order. If the order is preserved, it is ok, e.g. list of fields "a","c","f", if renaming "c" -> "d", it would be ok.

        Our use case is however this:

        We messed up our data in say, field "c", we rename it to "c_bak", and create a parallel index with one field and name if "c". merge the indexes. c_bak is then never accessed.

        Would this work?

        Show
        john.wang@gmail.com John Wang added a comment - Just did a test: You are right, IndexReader.terms(Term) would no longer find the rename field name if the field name is out of order. If the order is preserved, it is ok, e.g. list of fields "a","c","f", if renaming "c" -> "d", it would be ok. Our use case is however this: We messed up our data in say, field "c", we rename it to "c_bak", and create a parallel index with one field and name if "c". merge the indexes. c_bak is then never accessed. Would this work?
        Hide
        john.wang@gmail.com John Wang added a comment -

        Good point. But do you ever sort across fields?

        Show
        john.wang@gmail.com John Wang added a comment - Good point. But do you ever sort across fields?
        Hide
        mikemccand Michael McCandless added a comment -

        Hmm... isn't there a danger here that a field rename would change the term sort order?

        Ie, terms in the terms dict are sorted first by field and then by term text. Seems like this tool could break that, in the index?

        Show
        mikemccand Michael McCandless added a comment - Hmm... isn't there a danger here that a field rename would change the term sort order? Ie, terms in the terms dict are sorted first by field and then by term text. Seems like this tool could break that, in the index?
        Hide
        john.wang@gmail.com John Wang added a comment -

        part of the code was originally posted on nabble, but is not removed:
        www.nabble.com/file/p15221929/fieldrename

        Show
        john.wang@gmail.com John Wang added a comment - part of the code was originally posted on nabble, but is not removed: www.nabble.com/file/p15221929/fieldrename

          People

          • Assignee:
            Unassigned
            Reporter:
            john.wang@gmail.com John Wang
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development