Lucene - Core
  1. Lucene - Core
  2. LUCENE-2154

Need a clean way for Dir/MultiReader to "merge" the AttributeSources of the sub-readers

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.9, 5.0
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The flex API allows extensibility at the Fields/Terms/Docs/PositionsEnum levels, for a codec to set custom attrs.

      But, it's currently broken for Dir/MultiReader, which must somehow share attrs across all the sub-readers. Somehow we must make a single attr source, and tell each sub-reader's enum to use that instead of creating its own. Hopefully Uwe can work some magic here

      1. LUCENE-2154-javassist.patch
        15 kB
        Uwe Schindler
      2. LUCENE-2154-javassist.patch
        16 kB
        Uwe Schindler
      3. LUCENE-2154-Jakarta-BCEL.patch
        20 kB
        Uwe Schindler
      4. LUCENE-2154-cglib.patch
        14 kB
        Uwe Schindler
      5. LUCENE-2154.patch
        13 kB
        Uwe Schindler
      6. LUCENE-2154.patch
        14 kB
        Uwe Schindler
      7. ASF.LICENSE.NOT.GRANTED--LUCENE-2154-Jakarta-BCEL.patch
        21 kB
        Uwe Schindler

        Activity

        Michael McCandless created issue -
        Hide
        Uwe Schindler added a comment -

        I have an idea, but please don't hurt me, it uses reflection.
        What is needed:
        Like in the TokenStream BW layer, we had a class called TokenWrapper that implemented all interfaces for basic Tokens but was a wrapper around a Token. This Token was exchangeable.
        The idea for this case is:

        • DirReader/MultiReader cannot share thesame AttributeSource at all, because e.g. if th DirReader's enum first tries to call next() on on the first enum, the attributes are overriden. If DirrReader then instead also calls next() on the second sub-enum, the data of the first is overriden and cannot be restaured. The idea is now, to have the same Wrapper like TokenWrapper, where the interface impls behind can be exchanged, so the Dir/MultiReader enum just points its own attribute wrapper to the correct attributeset behind.
        • The problem are custom attributes or e.g. BoostAttribute. Nobody knows what attributes may be generated, so it is impossible to create an wrapper for all.
          But Java has a solution: http://java.sun.com/j2se/1.5.0/docs/api/java/lang/reflect/Proxy.html. This class implements a proxy stub for a set of interfaces (you just pass all interfaces needed) andthe returned object can be added to the local DirReader/MultiReaders enum attsource using addAttribute(). You only have to point to a implementation class that gets the Method reference and calls the corresponding method in the current interface (Like the actual Token in TokenWrapper) of the sub-enum.

        Sounds like ugly magic, but I have to test how fast it is and how it exactly works with AttributeSource. But the proxy instances could be cached like the TokenStream BW compatibility.

        Show
        Uwe Schindler added a comment - I have an idea, but please don't hurt me, it uses reflection. What is needed: Like in the TokenStream BW layer, we had a class called TokenWrapper that implemented all interfaces for basic Tokens but was a wrapper around a Token. This Token was exchangeable. The idea for this case is: DirReader/MultiReader cannot share thesame AttributeSource at all, because e.g. if th DirReader's enum first tries to call next() on on the first enum, the attributes are overriden. If DirrReader then instead also calls next() on the second sub-enum, the data of the first is overriden and cannot be restaured. The idea is now, to have the same Wrapper like TokenWrapper, where the interface impls behind can be exchanged, so the Dir/MultiReader enum just points its own attribute wrapper to the correct attributeset behind. The problem are custom attributes or e.g. BoostAttribute. Nobody knows what attributes may be generated, so it is impossible to create an wrapper for all. But Java has a solution: http://java.sun.com/j2se/1.5.0/docs/api/java/lang/reflect/Proxy.html . This class implements a proxy stub for a set of interfaces (you just pass all interfaces needed) andthe returned object can be added to the local DirReader/MultiReaders enum attsource using addAttribute(). You only have to point to a implementation class that gets the Method reference and calls the corresponding method in the current interface (Like the actual Token in TokenWrapper) of the sub-enum. Sounds like ugly magic, but I have to test how fast it is and how it exactly works with AttributeSource. But the proxy instances could be cached like the TokenStream BW compatibility.
        Hide
        Michael McCandless added a comment -

        Good grief... pulling out the heavy guns here!

        Show
        Michael McCandless added a comment - Good grief... pulling out the heavy guns here!
        Hide
        Uwe Schindler added a comment -

        I tried to implement that with JDK's reflect.Proxy, which would be really cool, but sucess stopped once I reached the point that the attribute interface implementation must subclass AttributeImpl vs. generated proxies subclass reflect.Proxy.
        A fix would be to use cglib, but that would be an external reference. With cglib it would even be possible to autogenerate a fast proxy without reflection at all (e.g. using the Emitter class on the lowest level)!

        The only current solution I see is to enforce all Attributes not only to implement a *Impl, but also provide a class *Proxy, that respects the above code. But this classes would be so stupid (only contain a modifiable delegate of the same attribute type and forward all methods to it). Cglib code that does this is hard stuff, but simply to implement.

        Show
        Uwe Schindler added a comment - I tried to implement that with JDK's reflect.Proxy, which would be really cool, but sucess stopped once I reached the point that the attribute interface implementation must subclass AttributeImpl vs. generated proxies subclass reflect.Proxy. A fix would be to use cglib, but that would be an external reference. With cglib it would even be possible to autogenerate a fast proxy without reflection at all (e.g. using the Emitter class on the lowest level)! The only current solution I see is to enforce all Attributes not only to implement a *Impl, but also provide a class *Proxy, that respects the above code. But this classes would be so stupid (only contain a modifiable delegate of the same attribute type and forward all methods to it). Cglib code that does this is hard stuff, but simply to implement.
        Hide
        Renaud Delbru added a comment -

        Sorry in advance, maybe what I am saying is out of scope due to my partial understanding of the problem.

        I have start to look at the problem, in order to be able to use my own attributes from my own DocsAdnPositionsEnum classes.
        would it not be simpler to create a MultiAttributeSource that is instantiated in the MultiDocsAndPositionsEnum. At creation time, all the AttributeSource of the subreaders (which are available) will be passed in its constructor. This MultiAttributeSource will delegate the getAttribute call to the right DocsAndPositionsEnum$AttributeSource.

        There is not a single AttributeSource shared by all the subreader, but each subreader keeps its own AttributeSource. In this way, attributes are not overridden. The MultiAttributeSource is in fact like a Wrapper.

        One problem is when there is custom attributes, e.g. BoostAttribute. If I understand correctly, if the user tries to access the BoostAttribute, but one of the subreader does not know it, the IllegalArgumentException will be thrown. Under the hood, the MultiAttributeSource can check if the attribute exists on the current subreader, and if not it can rely on a default attribute, or a previously stored attribute (coming from a previous subreader).

        I am not sure if what I am saying is making some sense. It looks to me too simple to cover all the cases. Are there cases I am not aware of ? Could you give me some examples to make me aware of other problems ?

        Show
        Renaud Delbru added a comment - Sorry in advance, maybe what I am saying is out of scope due to my partial understanding of the problem. I have start to look at the problem, in order to be able to use my own attributes from my own DocsAdnPositionsEnum classes. would it not be simpler to create a MultiAttributeSource that is instantiated in the MultiDocsAndPositionsEnum. At creation time, all the AttributeSource of the subreaders (which are available) will be passed in its constructor. This MultiAttributeSource will delegate the getAttribute call to the right DocsAndPositionsEnum$AttributeSource. There is not a single AttributeSource shared by all the subreader, but each subreader keeps its own AttributeSource. In this way, attributes are not overridden. The MultiAttributeSource is in fact like a Wrapper. One problem is when there is custom attributes, e.g. BoostAttribute. If I understand correctly, if the user tries to access the BoostAttribute, but one of the subreader does not know it, the IllegalArgumentException will be thrown. Under the hood, the MultiAttributeSource can check if the attribute exists on the current subreader, and if not it can rely on a default attribute, or a previously stored attribute (coming from a previous subreader). I am not sure if what I am saying is making some sense. It looks to me too simple to cover all the cases. Are there cases I am not aware of ? Could you give me some examples to make me aware of other problems ?
        Hide
        Uwe Schindler added a comment -

        The problem is the following:

        Attributes are not to be retrieved on every call to next(), they are get/added after construction. If you have a consumer of your MultiEnum, it calls attributes().getAttribute exactly one time before start to enumerate tokens/positions/whatever. If your proposed MultiAttributeSource would return the attribute of the first sub-enum, the consumer would stay with this attribute instance forever. If the MultiEnum then changes to another sub-enum, the consumer would not see the new attribute.

        Because of that the right way is not to have a MultiAttributeSource. What you need are proxy attributes. The Attributes itsself must be proxies, delegating the call to the current enum's corresponding attribute. The same was done in Lucene 2.9 to emulate the backwards compatibility for TokenStreams. The proxy was TokenWrapper. These ProxyAttributes would look exactly like this TokenWrapper impl class.

        Show
        Uwe Schindler added a comment - The problem is the following: Attributes are not to be retrieved on every call to next(), they are get/added after construction. If you have a consumer of your MultiEnum, it calls attributes().getAttribute exactly one time before start to enumerate tokens/positions/whatever. If your proposed MultiAttributeSource would return the attribute of the first sub-enum, the consumer would stay with this attribute instance forever. If the MultiEnum then changes to another sub-enum, the consumer would not see the new attribute. Because of that the right way is not to have a MultiAttributeSource. What you need are proxy attributes. The Attributes itsself must be proxies, delegating the call to the current enum's corresponding attribute. The same was done in Lucene 2.9 to emulate the backwards compatibility for TokenStreams. The proxy was TokenWrapper. These ProxyAttributes would look exactly like this TokenWrapper impl class.
        Hide
        Renaud Delbru added a comment -

        I see. The problem is to return to the consumer a unique attribute reference when attributes().getAttribute is called, and then updates the references when iterating the enums in order to propagate the attribute changes to the consumer.

        I am trying to propose a (possible) alternative solution (if I understood the problem correctly), which can avoid reflection, but could potentially need a modification of the Attribute interface.

        If the MultiAttributeSource will create its own set of unique references for each attribute (the list of different attribute classes can be retrieved by calling the getAttributeClassesIterator() method of the AttributeSource for each subreader, we can then create a list of unique references, one reference for each type of attributes), the goal is then to update these references after each enum iteration or sub-enum change (in order to propagate the changes to the consumer).

        Unfortunately, I don't see any interface on the Attribute interface to 'copy' a given attribute. Each AttributeImpl could implement this 'copy method', which copies the state of a given attribute of the same class.
        Then, in the MultiDocsAndPositionsEnum, after each iteration or each sub-enum change, a call to MultiAttributeSource can be made explicitly to update the unique references of the different attributes. This update method will under the hood (1) check if the sub-enum is aware of the attribute class, (2) get the attribute from the sub-enum, and (3) copy the attribute to the unique attribute reference kept by MultiAttributeSource.

        Could this solution possibly work ?

        Show
        Renaud Delbru added a comment - I see. The problem is to return to the consumer a unique attribute reference when attributes().getAttribute is called, and then updates the references when iterating the enums in order to propagate the attribute changes to the consumer. I am trying to propose a (possible) alternative solution (if I understood the problem correctly), which can avoid reflection, but could potentially need a modification of the Attribute interface. If the MultiAttributeSource will create its own set of unique references for each attribute (the list of different attribute classes can be retrieved by calling the getAttributeClassesIterator() method of the AttributeSource for each subreader, we can then create a list of unique references, one reference for each type of attributes), the goal is then to update these references after each enum iteration or sub-enum change (in order to propagate the changes to the consumer). Unfortunately, I don't see any interface on the Attribute interface to 'copy' a given attribute. Each AttributeImpl could implement this 'copy method', which copies the state of a given attribute of the same class. Then, in the MultiDocsAndPositionsEnum, after each iteration or each sub-enum change, a call to MultiAttributeSource can be made explicitly to update the unique references of the different attributes. This update method will under the hood (1) check if the sub-enum is aware of the attribute class, (2) get the attribute from the sub-enum, and (3) copy the attribute to the unique attribute reference kept by MultiAttributeSource. Could this solution possibly work ?
        Hide
        Michael McCandless added a comment -

        What if we require that all segments are the same codec, if you want to use attributes from a Multi*Enum? (I think this limitation is fine... and if it's not, one could still operate per-segment with different attr impls per segment).

        This way, every segment would share the same attr impl for a given attr interface?

        And then couldn't we somehow force each segment to use the same attr impl as the last segment(s)?

        Show
        Michael McCandless added a comment - What if we require that all segments are the same codec, if you want to use attributes from a Multi*Enum? (I think this limitation is fine... and if it's not, one could still operate per-segment with different attr impls per segment). This way, every segment would share the same attr impl for a given attr interface? And then couldn't we somehow force each segment to use the same attr impl as the last segment(s)?
        Hide
        Uwe Schindler added a comment -

        That would work. So the MultiEnum would use its own AttributeSource and passes it downto the sub-enums. For that the ctor of *Enums should allow to pass an AttrubuteSource. I can provide patch.

        Show
        Uwe Schindler added a comment - That would work. So the MultiEnum would use its own AttributeSource and passes it downto the sub-enums. For that the ctor of *Enums should allow to pass an AttrubuteSource. I can provide patch.
        Hide
        Uwe Schindler added a comment -

        Here is a first patch about cglib-generated proxy attributes.

        In IRC we found out yesterday, that the proposed idea to share the attributes accross all Multi*Enums would result in problems as the call to next() on any sub-enum would overwrite the contents of the attributes of the previous sub-enum which would make TermsEnum not working (because e.g. TermsEnum looks forward by calling next() an all sub-enums and choosing the lowest term to return - after calling each enums next() the attributes of the first enums cannot be restored without captureState & co, as overwritten by the next() call to the last enum).

        This patch needs cglib-nodep-2.2.jar put into the lib-folder of the checkout http://sourceforge.net/projects/cglib/files/cglib2/2.2/cglib-nodep-2.2.jar/download.

        It contains a test and that shows how the usage is. The central part is cglib's Enhancer that creates a dynamic class extending ProxyAttributeImpl (which defines the general AttributeImpl methods delegating to the delegate) and implementing the requested Attribute interface using a MethodInterceptor.

        Please note: This uses no reflection (only during in-memory class file creation, which is only run one time on "loading" the proxy class). The proxy implements MethodInterceptor and uses the fast MethodProxy class (which is also generated by cglib for each proxied method, too) and can invoke the delegated method directly (without reflection) on the delegate.

        The test verifies everything works and also compares speed by using a TermAttribute natively and proxied. The speed is lower (which is not caused by reflection, but by the MethodInterceptor creating an array of parameters and boxing/unboxing native parameters into the Object[]), but for the testcase I have seen about only 50% more time needed.

        The generated classes are cached and reused (like DEFAULT_ATTRIBUTE_FACTORY does).

        To get maximum speed and no external libraries, the code implemented by Enhancer can be rewritten natively using the Apache Harmony java.lang.reflect.Proxy implementation source code as basis. The hardest part in generating bytecode is the ConstantPool in class files. But as the proxy methods are simply delegating and no magic like boxing/unboxing is needed, the generated bytecode is rather simple.

        One other use-case for these proxies is AppendingTokenStream, which is not possible since 3.0 without captureState (in old TS API it was possible, because you could reuse the same TokenInstance even over the appended streams). In the new TS api, the appending stream must have a "view" on the attributes of the current consuming sub-stream.

        Show
        Uwe Schindler added a comment - Here is a first patch about cglib-generated proxy attributes. In IRC we found out yesterday, that the proposed idea to share the attributes accross all Multi*Enums would result in problems as the call to next() on any sub-enum would overwrite the contents of the attributes of the previous sub-enum which would make TermsEnum not working (because e.g. TermsEnum looks forward by calling next() an all sub-enums and choosing the lowest term to return - after calling each enums next() the attributes of the first enums cannot be restored without captureState & co, as overwritten by the next() call to the last enum). This patch needs cglib-nodep-2.2.jar put into the lib-folder of the checkout http://sourceforge.net/projects/cglib/files/cglib2/2.2/cglib-nodep-2.2.jar/download . It contains a test and that shows how the usage is. The central part is cglib's Enhancer that creates a dynamic class extending ProxyAttributeImpl (which defines the general AttributeImpl methods delegating to the delegate) and implementing the requested Attribute interface using a MethodInterceptor. Please note: This uses no reflection (only during in-memory class file creation, which is only run one time on "loading" the proxy class). The proxy implements MethodInterceptor and uses the fast MethodProxy class (which is also generated by cglib for each proxied method, too) and can invoke the delegated method directly (without reflection) on the delegate. The test verifies everything works and also compares speed by using a TermAttribute natively and proxied. The speed is lower (which is not caused by reflection, but by the MethodInterceptor creating an array of parameters and boxing/unboxing native parameters into the Object[]), but for the testcase I have seen about only 50% more time needed. The generated classes are cached and reused (like DEFAULT_ATTRIBUTE_FACTORY does). To get maximum speed and no external libraries, the code implemented by Enhancer can be rewritten natively using the Apache Harmony java.lang.reflect.Proxy implementation source code as basis. The hardest part in generating bytecode is the ConstantPool in class files. But as the proxy methods are simply delegating and no magic like boxing/unboxing is needed, the generated bytecode is rather simple. One other use-case for these proxies is AppendingTokenStream, which is not possible since 3.0 without captureState (in old TS API it was possible, because you could reuse the same TokenInstance even over the appended streams). In the new TS api, the appending stream must have a "view" on the attributes of the current consuming sub-stream.
        Uwe Schindler made changes -
        Field Original Value New Value
        Attachment LUCENE-2154.patch [ 12435565 ]
        Hide
        Uwe Schindler added a comment -

        I had some more fun. Made ProxyAttributeSource non-final and added class name policy to also contain the corresponding interface (to make stack traces on errors nicer).

        Here the example output:

            [junit] DEBUG: Created class org.apache.lucene.util.ProxyAttributeSource$ProxyAttributeImpl$$TermAttribute$$EnhancerByCGLIB$$6100bdf9 for attribute org.apache.lucene.analysis.tokenattributes.TermAttribute
            [junit] DEBUG: Created class org.apache.lucene.util.ProxyAttributeSource$ProxyAttributeImpl$$TypeAttribute$$EnhancerByCGLIB$$6f89c3ff for attribute org.apache.lucene.analysis.tokenattributes.TypeAttribute
            [junit] DEBUG: Created class org.apache.lucene.util.ProxyAttributeSource$ProxyAttributeImpl$$FlagsAttribute$$EnhancerByCGLIB$$4668733c for attribute org.apache.lucene.analysis.tokenattributes.FlagsAttribute
            [junit] Time taken using org.apache.lucene.analysis.tokenattributes.TermAttributeImpl:
            [junit]   1476.090658 ms for 10000000 iterations
            [junit] Time taken using org.apache.lucene.util.ProxyAttributeSource$ProxyAttributeImpl$$TermAttribute$$EnhancerByCGLIB$$6100bdf
        9:
            [junit]   1881.295734 ms for 10000000 iterations
        
        Show
        Uwe Schindler added a comment - I had some more fun. Made ProxyAttributeSource non-final and added class name policy to also contain the corresponding interface (to make stack traces on errors nicer). Here the example output: [junit] DEBUG: Created class org.apache.lucene.util.ProxyAttributeSource$ProxyAttributeImpl$$TermAttribute$$EnhancerByCGLIB$$6100bdf9 for attribute org.apache.lucene.analysis.tokenattributes.TermAttribute [junit] DEBUG: Created class org.apache.lucene.util.ProxyAttributeSource$ProxyAttributeImpl$$TypeAttribute$$EnhancerByCGLIB$$6f89c3ff for attribute org.apache.lucene.analysis.tokenattributes.TypeAttribute [junit] DEBUG: Created class org.apache.lucene.util.ProxyAttributeSource$ProxyAttributeImpl$$FlagsAttribute$$EnhancerByCGLIB$$4668733c for attribute org.apache.lucene.analysis.tokenattributes.FlagsAttribute [junit] Time taken using org.apache.lucene.analysis.tokenattributes.TermAttributeImpl: [junit] 1476.090658 ms for 10000000 iterations [junit] Time taken using org.apache.lucene.util.ProxyAttributeSource$ProxyAttributeImpl$$TermAttribute$$EnhancerByCGLIB$$6100bdf 9: [junit] 1881.295734 ms for 10000000 iterations
        Uwe Schindler made changes -
        Attachment LUCENE-2154.patch [ 12435568 ]
        Hide
        Uwe Schindler added a comment -

        Here the last CGLIB patch for reference.

        Now the real cool class created using JAVASSIST http://www.javassist.org/:
        You have to place the latest javassist.jar (Mozilla/LGPL licensed) in the lib/ folder and apply the patch. What it does is the fastest proxy we can think of:
        It creates a subclass of ProxyAttributeImpl that implements all methods of the interface natively in bytecode using JAVASSIST's bytecode generation tools (a subset of the Java language spec).

        The micro-benchmark shows, no difference between proxied and native method - as hotspot removes the extra method call.

        With Javassist it would even be possible to create classes that implement our interfaces around simple fields that are set by get/setters. Just like Eclipse's create get/set around a private field. That would be really cool. Or we could create combining attributes on the fly, Michael Busch would be excited. All *Impl classes we currently have would be almost obsolete (except TermAttributeImpl, which is rather complex). We could also create dynamic State classes for capturing state...

        Nice, but a little bit hackish. Maybe we put this first into contrib and supply a ConcenatingTokenStream as demo impl and also other Solr TokenStreams that are no longer easy with the Attributes without proxies (Robert listed some).

        Show
        Uwe Schindler added a comment - Here the last CGLIB patch for reference. Now the real cool class created using JAVASSIST http://www.javassist.org/ : You have to place the latest javassist.jar (Mozilla/LGPL licensed) in the lib/ folder and apply the patch. What it does is the fastest proxy we can think of: It creates a subclass of ProxyAttributeImpl that implements all methods of the interface natively in bytecode using JAVASSIST's bytecode generation tools (a subset of the Java language spec). The micro-benchmark shows, no difference between proxied and native method - as hotspot removes the extra method call. With Javassist it would even be possible to create classes that implement our interfaces around simple fields that are set by get/setters. Just like Eclipse's create get/set around a private field. That would be really cool. Or we could create combining attributes on the fly, Michael Busch would be excited. All *Impl classes we currently have would be almost obsolete (except TermAttributeImpl, which is rather complex). We could also create dynamic State classes for capturing state... Nice, but a little bit hackish. Maybe we put this first into contrib and supply a ConcenatingTokenStream as demo impl and also other Solr TokenStreams that are no longer easy with the Attributes without proxies (Robert listed some).
        Uwe Schindler made changes -
        Attachment LUCENE-2154-cglib.patch [ 12435713 ]
        Attachment LUCENE-2154-javassist.patch [ 12435714 ]
        Uwe Schindler made changes -
        Attachment LUCENE-2154-javassist.patch [ 12435714 ]
        Hide
        Uwe Schindler added a comment -

        Better patch without classloader problems.

        Show
        Uwe Schindler added a comment - Better patch without classloader problems.
        Uwe Schindler made changes -
        Attachment LUCENE-2154-javassist.patch [ 12435721 ]
        Hide
        Uwe Schindler added a comment -

        More cool, less casts, more speed.

        Show
        Uwe Schindler added a comment - More cool, less casts, more speed.
        Uwe Schindler made changes -
        Attachment LUCENE-2154-javassist.patch [ 12435757 ]
        Hide
        Uwe Schindler added a comment -

        Here the third incarnation of the patch. A little bit more complicated, because directly assemblying bytecode, but much cleaner and without strange parsers. It uses Jakarta BCEL, which is e.g. also used by Apache XALAN XSLTC, and its even included in JDK5 (somehow inofficial in renamed com.sun.internal packages).

        The approach is the same as with JavAssist, but has several advantages:

        • The typing is hard, so all bytecode is generated using real Type classes, retrieved from reflection analysis of the Attribute-Interface to proxy. The method signature is copied.
        • It will not likely break with later VDKs if the class format changes again. The Problem with JavAssist is, that it loads the interface and implementation classes from the bytecode and must analyze it. If we get Java 7 and a new classformat, this will likely break (as e.g. classes loaded from rt.jar may not be analyzed). The BCEL approach does not use loading classes in bytecode, it just uses reflection to generate the proxy method signatures and inserts the byte code to delegate to the delegate.
        • The generated class is not loaded by a hack, but instead a delegating ClassLoader is dynamically created to load the class into the JVM. The classloader delegates all other requests to the proxied Attribute's classloader.

        By the way, XALAN is bundling BCEL, in old version, but under the original package names, which may lead to conflics when also bundling in lucene. I would prefer to import the whole source (in parts) like automaton was put into o.a.l.util. But that only, if we really want to do it like this way. But I still think for some TokenStreams this would be a real speed improvement, so we can also make a contrib package out of it.

        Show
        Uwe Schindler added a comment - Here the third incarnation of the patch. A little bit more complicated, because directly assemblying bytecode, but much cleaner and without strange parsers. It uses Jakarta BCEL, which is e.g. also used by Apache XALAN XSLTC, and its even included in JDK5 (somehow inofficial in renamed com.sun.internal packages). The approach is the same as with JavAssist, but has several advantages: The typing is hard, so all bytecode is generated using real Type classes, retrieved from reflection analysis of the Attribute-Interface to proxy. The method signature is copied. It will not likely break with later VDKs if the class format changes again. The Problem with JavAssist is, that it loads the interface and implementation classes from the bytecode and must analyze it. If we get Java 7 and a new classformat, this will likely break (as e.g. classes loaded from rt.jar may not be analyzed). The BCEL approach does not use loading classes in bytecode, it just uses reflection to generate the proxy method signatures and inserts the byte code to delegate to the delegate. The generated class is not loaded by a hack, but instead a delegating ClassLoader is dynamically created to load the class into the JVM. The classloader delegates all other requests to the proxied Attribute's classloader. By the way, XALAN is bundling BCEL, in old version, but under the original package names, which may lead to conflics when also bundling in lucene. I would prefer to import the whole source (in parts) like automaton was put into o.a.l.util. But that only, if we really want to do it like this way. But I still think for some TokenStreams this would be a real speed improvement, so we can also make a contrib package out of it.
        Uwe Schindler made changes -
        Attachment LUCENE-2154-Jakarta-BCEL.patch [ 12437501 ]
        Uwe Schindler made changes -
        Fix Version/s 3.1 [ 12314025 ]
        Fix Version/s Flex Branch [ 12314439 ]
        Hide
        Uwe Schindler added a comment -

        Slightly improved patch to correctly work with CharTermAttribute (as it defines methods also defined by ProxyAttributeImpl as final, so override failure).

        Show
        Uwe Schindler added a comment - Slightly improved patch to correctly work with CharTermAttribute (as it defines methods also defined by ProxyAttributeImpl as final, so override failure).
        Uwe Schindler made changes -
        Attachment LUCENE-2154-Jakarta-BCEL.patch [ 12441388 ]
        Uwe Schindler made changes -
        Fix Version/s 4.0.0 [ 12314822 ]
        Fix Version/s 3.1 [ 12314025 ]
        Uwe Schindler made changes -
        Fix Version/s 4.0 [ 12314025 ]
        Fix Version/s 3.1 [ 12314822 ]
        Uwe Schindler made changes -
        Affects Version/s 4.0 [ 12314025 ]
        Affects Version/s Flex Branch [ 12314439 ]
        Mark Thomas made changes -
        Workflow jira [ 12484494 ] Default workflow, editable Closed status [ 12563646 ]
        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12563646 ] jira [ 12584353 ]
        Robert Muir made changes -
        Fix Version/s 4.1 [ 12321140 ]
        Fix Version/s 4.0 [ 12314025 ]
        Mark Miller made changes -
        Fix Version/s 5.0 [ 12321663 ]
        Mark Miller made changes -
        Fix Version/s 4.2 [ 12323899 ]
        Fix Version/s 4.1 [ 12321140 ]
        Robert Muir made changes -
        Fix Version/s 4.3 [ 12324143 ]
        Fix Version/s 5.0 [ 12321663 ]
        Fix Version/s 4.2 [ 12323899 ]
        Uwe Schindler made changes -
        Fix Version/s 4.4 [ 12324323 ]
        Fix Version/s 4.3 [ 12324143 ]
        Hide
        Steve Rowe added a comment -

        Bulk move 4.4 issues to 4.5 and 5.0

        Show
        Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
        Steve Rowe made changes -
        Fix Version/s 5.0 [ 12321663 ]
        Fix Version/s 4.5 [ 12324742 ]
        Fix Version/s 4.4 [ 12324323 ]
        Adrien Grand made changes -
        Fix Version/s 4.6 [ 12324999 ]
        Fix Version/s 5.0 [ 12321663 ]
        Fix Version/s 4.5 [ 12324742 ]
        Simon Willnauer made changes -
        Fix Version/s 4.7 [ 12325572 ]
        Fix Version/s 4.6 [ 12324999 ]
        David Smiley made changes -
        Fix Version/s 4.8 [ 12326269 ]
        Fix Version/s 4.7 [ 12325572 ]
        Hide
        Uwe Schindler added a comment -

        Move issue to Lucene 4.9.

        Show
        Uwe Schindler added a comment - Move issue to Lucene 4.9.
        Uwe Schindler made changes -
        Fix Version/s 4.9 [ 12326730 ]
        Fix Version/s 5.0 [ 12321663 ]
        Fix Version/s 4.8 [ 12326269 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development