Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Now that we have pluggable codecs (well, PostingsFormat), an app should use a custom PostingsFormat if it really must change payloads while merging.

      Alternatively, use a FilteredIndexReader to modify anything during addIndexes (eg the facets use case, modifying payloads).

      Since this capability can be handled by existing more-generic functions I don't see why we need to keep PPP around in core. PPP is also fragile because an app generally has no visibility on when a merge commits so it can't know if the payloads it retrieves are pre or post PPP.

      I think merging shouldn't change postings as a side-effect (by default, anyway, since a custom PF can of course override merge and do whatever it wants).

      1. LUCENE-4304.patch
        49 kB
        Robert Muir
      2. LUCENE-4304.patch
        47 kB
        Robert Muir

        Activity

        Hide
        Uwe Schindler added a comment -

        +1 - This class horrified me while refactoring IndexReaders beginning of the year! I already patched it to better fit in here, but FilterAtomicReader or a custom codec is the way to go.

        I think merging shouldn't change postings as a side-effect (by default, anyway, since a custom PF can of course override merge and do whatever it wants).

        If you really want to do this, use a FilterAtomicReader and rewrite payloads while doing IW.addIndexes(FilterAtomicReader...)

        Show
        Uwe Schindler added a comment - +1 - This class horrified me while refactoring IndexReaders beginning of the year! I already patched it to better fit in here, but FilterAtomicReader or a custom codec is the way to go. I think merging shouldn't change postings as a side-effect (by default, anyway, since a custom PF can of course override merge and do whatever it wants). If you really want to do this, use a FilterAtomicReader and rewrite payloads while doing IW.addIndexes(FilterAtomicReader...)
        Hide
        Shai Erera added a comment -

        The facet module comes with an example code for merging a taxonomy index (followed by IW.addIndexes(Dir)). It'd be good if in the process of removing PPP we can still keep the example, only implemented with a FilterAtomicReader. I think FAR is better than Codec because it doesn't require you to open an IW with that Codec set, rather you can just IW.addIndexes(IndexReader).

        Show
        Shai Erera added a comment - The facet module comes with an example code for merging a taxonomy index (followed by IW.addIndexes(Dir)). It'd be good if in the process of removing PPP we can still keep the example, only implemented with a FilterAtomicReader. I think FAR is better than Codec because it doesn't require you to open an IW with that Codec set, rather you can just IW.addIndexes(IndexReader).
        Hide
        Robert Muir added a comment -

        +1 for a FilterReader for the facet module. I was looking at this and I think it would be more efficient and simpler.

        Show
        Robert Muir added a comment - +1 for a FilterReader for the facet module. I was looking at this and I think it would be more efficient and simpler.
        Hide
        Robert Muir added a comment -

        here's a prototype patch: needs docs but all tests pass.

        Show
        Robert Muir added a comment - here's a prototype patch: needs docs but all tests pass.
        Hide
        Michael McCandless added a comment -

        Wow that was fast!

        Patch look great.

        Show
        Michael McCandless added a comment - Wow that was fast! Patch look great.
        Hide
        Uwe Schindler added a comment -

        Hi,

        I reviewed the FiterAtomicReader, looks much better now.

        Minor improvment:

        +    DirectoryReader reader = DirectoryReader.open(srcIndexDir);
        +    ArrayList<AtomicReader> subReaders = new ArrayList<AtomicReader>();
        +    for (AtomicReader sub : reader.getSequentialSubReaders()) {
        +      subReaders.add(new OrdinalMappingAtomicReader(sub, ordinalMap, params));
        +    }
        

        I would use leaves(), although the DirectoryReader subs are atomic, but it would make the whole thing more universal useable (see also MultiPassIndexSplitter and PKIndexSplitter, which are similar). Also the size of leaves() or getSeqReaders() is known before, so you can directly use new AtomicReader[leaves().size()], so no copy/grow needed.

        Show
        Uwe Schindler added a comment - Hi, I reviewed the FiterAtomicReader, looks much better now. Minor improvment: + DirectoryReader reader = DirectoryReader.open(srcIndexDir); + ArrayList<AtomicReader> subReaders = new ArrayList<AtomicReader>(); + for (AtomicReader sub : reader.getSequentialSubReaders()) { + subReaders.add(new OrdinalMappingAtomicReader(sub, ordinalMap, params)); + } I would use leaves(), although the DirectoryReader subs are atomic, but it would make the whole thing more universal useable (see also MultiPassIndexSplitter and PKIndexSplitter, which are similar). Also the size of leaves() or getSeqReaders() is known before, so you can directly use new AtomicReader [leaves().size()] , so no copy/grow needed.
        Hide
        Uwe Schindler added a comment -

        I would use leaves(), although the DirectoryReader subs are atomic

        I say this because I want to make getSeqSubReaders abstract protected in CompositeReader, so less work And yes, your sugar API will come, but a few chars more dont hurt here....

        Show
        Uwe Schindler added a comment - I would use leaves(), although the DirectoryReader subs are atomic I say this because I want to make getSeqSubReaders abstract protected in CompositeReader, so less work And yes, your sugar API will come, but a few chars more dont hurt here....
        Hide
        Robert Muir added a comment -

        yes its my silent protest against getTopLevelContext()

        Show
        Robert Muir added a comment - yes its my silent protest against getTopLevelContext()
        Hide
        Uwe Schindler added a comment -

        Can you fix the predefined size and remove ArrayList and replace by AtomicReader[]?

        Show
        Uwe Schindler added a comment - Can you fix the predefined size and remove ArrayList and replace by AtomicReader[]?
        Hide
        Robert Muir added a comment -

        Updated patch:

        • added javadocs/code example like PPP had
        • pass divisor=-1 to the reader we open in TaxonomyMergeUtils
        • use IR.leaves() in the example/TaxonomyMergeUtils
        Show
        Robert Muir added a comment - Updated patch: added javadocs/code example like PPP had pass divisor=-1 to the reader we open in TaxonomyMergeUtils use IR.leaves() in the example/TaxonomyMergeUtils
        Hide
        Shai Erera added a comment -

        Patch looks good. One minor comment – OrdinalMappingAtomicReader javadocs refer to LuceneTaxonomyWriter, which is now DirectoryTaxonomyWriter (I guess it's a copy-paste error from FacetPPP).

        +1 to commit.

        Show
        Shai Erera added a comment - Patch looks good. One minor comment – OrdinalMappingAtomicReader javadocs refer to LuceneTaxonomyWriter, which is now DirectoryTaxonomyWriter (I guess it's a copy-paste error from FacetPPP). +1 to commit.
        Hide
        Uwe Schindler added a comment -

        +1 to commit. Can we also add a MIGRATE.txt documentation section?

        Show
        Uwe Schindler added a comment - +1 to commit. Can we also add a MIGRATE.txt documentation section?
        Hide
        Robert Muir added a comment -

        I think MIGRATE.txt is overkill here. This was a really expert API directly on IndexWriter (not even in IWConfig)

        Show
        Robert Muir added a comment - I think MIGRATE.txt is overkill here. This was a really expert API directly on IndexWriter (not even in IWConfig)
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Unassigned
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development