Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating Parameter.

      Replace other custom enum patterns with Java 5 enums.

      1. LUCENE-1998_enum.patch
        39 kB
        DM Smith
      2. LUCENE-1998_enum.patch
        38 kB
        Uwe Schindler
      3. LUCENE-1998_enum.patch
        38 kB
        Uwe Schindler
      4. LUCENE-1998_enum.patch
        39 kB
        Uwe Schindler
      5. LUCENE-1998_enum.patch
        42 kB
        Uwe Schindler
      6. LUCENE-1998_enum_BW.patch
        2 kB
        Uwe Schindler
      7. LUCENE-1998_enum_BW.patch
        2 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Committed revision: 828156

          Thanks DM Smith!

          Show
          Uwe Schindler added a comment - Committed revision: 828156 Thanks DM Smith!
          Hide
          Uwe Schindler added a comment -

          Better BW test

          Show
          Uwe Schindler added a comment - Better BW test
          Hide
          Uwe Schindler added a comment -

          I changed the Version enum. All test still pass. I also added a test for the backwards branch that tests, that the transition from Parameter -> enum is binary compatible and supported by Java's linker.

          I will commit soon.

          Show
          Uwe Schindler added a comment - I changed the Version enum. All test still pass. I also added a test for the backwards branch that tests, that the transition from Parameter -> enum is binary compatible and supported by Java's linker. I will commit soon.
          Hide
          DM Smith added a comment -

          changing the order of enum constants is bad, you should always add them at the end

          Is this true?

          I did not know how Java serializes enums so I went looking:
          See: http://java.sun.com/j2se/1.5.0/docs/guide/serialization/relnotes15.html

          Turns out it serializes the text representation of the enum constant and class info. This is just like the Parameter class.

          If I understand it correctly, with this, an enum is resilient to changes in order. New constants can go in any place (for example, we can later add LUCENE_291 before LUCENE_30) and not break serialization compatibility.

          This is especially good for the future as it allows a path for deprecations. (E.g. deprecation of o.a.l.d.Field.Index.COMPRESS)

          So having LUCENE_CURRENT at the end is fine.

          If we wanted it first (or anywhere else) we could have onOrAfter to be:
          public boolean onOrAfter(Version other)

          { return other == LUCENE_CURRENT || compareTo(other) >= 0; }

          If we wanted to expose version numbering info in the future, I'd suggest the following pattern (names are unimportant):
          LUCENE_29 {
          public int getMajor()

          { return 2; }

          public int getMinor()

          { return 9; }

          public int getFix()

          { return 0; }

          }
          because it does not require storage and unlike "2900" does not have positional notation meaning (PIC code), e.g. public int getMajor()

          { return int(2900/1000); }
          Show
          DM Smith added a comment - changing the order of enum constants is bad, you should always add them at the end Is this true? I did not know how Java serializes enums so I went looking: See: http://java.sun.com/j2se/1.5.0/docs/guide/serialization/relnotes15.html Turns out it serializes the text representation of the enum constant and class info. This is just like the Parameter class. If I understand it correctly, with this, an enum is resilient to changes in order. New constants can go in any place (for example, we can later add LUCENE_291 before LUCENE_30) and not break serialization compatibility. This is especially good for the future as it allows a path for deprecations. (E.g. deprecation of o.a.l.d.Field.Index.COMPRESS) So having LUCENE_CURRENT at the end is fine. If we wanted it first (or anywhere else) we could have onOrAfter to be: public boolean onOrAfter(Version other) { return other == LUCENE_CURRENT || compareTo(other) >= 0; } If we wanted to expose version numbering info in the future, I'd suggest the following pattern (names are unimportant): LUCENE_29 { public int getMajor() { return 2; } public int getMinor() { return 9; } public int getFix() { return 0; } } because it does not require storage and unlike "2900" does not have positional notation meaning (PIC code), e.g. public int getMajor() { return int(2900/1000); }
          Hide
          Uwe Schindler added a comment -

          I thought about that, too: I would not do this. Especially because I want to have the 0-version (current) as first element for serialization purposes (changing the order of enum constants is bad, you should always add them at the end).

          Eventually we want to make the accessor to the interver v somehow public (for more specific comaprisons and so on).

          Show
          Uwe Schindler added a comment - I thought about that, too: I would not do this. Especially because I want to have the 0-version (current) as first element for serialization purposes (changing the order of enum constants is bad, you should always add them at the end). Eventually we want to make the accessor to the interver v somehow public (for more specific comaprisons and so on).
          Hide
          DM Smith added a comment -

          I just noticed that enums are comparable. For the enum Version, we could take advantage for this and not store a number for each value. It would be important to maintain order of versions in the file from earliest to latest.

          Should we do this?

          Then the current patch's (comments removed for clarity):

          public enum Version {
          LUCENE_CURRENT (0),
          LUCENE_20 (2000),
          LUCENE_21 (2100),
          LUCENE_22 (2200),
          LUCENE_23 (2300),
          LUCENE_24 (2400),
          LUCENE_29 (2900),
          LUCENE_30 (3000);

          private Version(int v)

          { this.v = v; }

          public boolean onOrAfter(Version other)

          { return v == 0 || v >= other.v; }

          private final int v;
          }

          Would become (the comment on strict ordering is necessary):

          public enum Version {

          // These have to be ordered from the oldest to the newest version
          LUCENE_20,
          LUCENE_21,
          LUCENE_22,
          LUCENE_23,
          LUCENE_24,
          LUCENE_29,
          LUCENE_30,
          // This needs to be last
          LUCENE_CURRENT;

          /** A convienence method merely calling this.compareTo(other) >= 0 */
          public boolean onOrAfter(Version other)

          { return compareTo(other) >= 0; }

          }

          Show
          DM Smith added a comment - I just noticed that enums are comparable. For the enum Version, we could take advantage for this and not store a number for each value. It would be important to maintain order of versions in the file from earliest to latest. Should we do this? Then the current patch's (comments removed for clarity): public enum Version { LUCENE_CURRENT (0), LUCENE_20 (2000), LUCENE_21 (2100), LUCENE_22 (2200), LUCENE_23 (2300), LUCENE_24 (2400), LUCENE_29 (2900), LUCENE_30 (3000); private Version(int v) { this.v = v; } public boolean onOrAfter(Version other) { return v == 0 || v >= other.v; } private final int v; } Would become (the comment on strict ordering is necessary): public enum Version { // These have to be ordered from the oldest to the newest version LUCENE_20, LUCENE_21, LUCENE_22, LUCENE_23, LUCENE_24, LUCENE_29, LUCENE_30, // This needs to be last LUCENE_CURRENT; /** A convienence method merely calling this.compareTo(other) >= 0 */ public boolean onOrAfter(Version other) { return compareTo(other) >= 0; } }
          Hide
          Uwe Schindler added a comment -

          (it's "bq." not ".bq" )

          So it works, but not with switch statements.

          IMHO: Having a switch statement (or cascading if-then-else) over the collection of values is generally indicative of a bad design (or an opportunity for an improved design By adding methods to each enum that return literals, we can eliminate this and at the same time, improve performance.

          You are right, my problem was more for client code of Lucene that may for example have a switch statement on Field.Index (e.g. Solr) to control some further indexing steps. If we rename the constant, the switch statement would not work (it would work in already compiled code), but not if the code is recompiled against the modified version. That was my problem. In 3.0 this will not happen as there are no deprec enum constants, but maybe later. In this case, a CHANGES.txt entry should be added.

          There is another tuning opportunity, which I didn't take. We are marshaling out the flags from the enums into member variables. I'm not sure how efficient the storage of a boolean vs an enum is. If it is a wash, then having an enum value as replacement would be a good thing. It sould clearly document what controls the flag.

          This is currently not possibible because of backwards compatibility, because the fields are protected and not deprecated in 2.9. I think with your change we are fine.

          The only complication would be the set/get for some of the flags. (E.g. AbstractField.setOmitNorms.) What's with that? Are the enum values merely a hint??? Does it make sense to allow omitNorms to be changed after an AbstractField is being used?

          It is perfectly legal to change these constants after creating the field, so the setters must be there.

          Show
          Uwe Schindler added a comment - (it's "bq." not ".bq" ) So it works, but not with switch statements. IMHO: Having a switch statement (or cascading if-then-else) over the collection of values is generally indicative of a bad design (or an opportunity for an improved design By adding methods to each enum that return literals, we can eliminate this and at the same time, improve performance. You are right, my problem was more for client code of Lucene that may for example have a switch statement on Field.Index (e.g. Solr) to control some further indexing steps. If we rename the constant, the switch statement would not work (it would work in already compiled code), but not if the code is recompiled against the modified version. That was my problem. In 3.0 this will not happen as there are no deprec enum constants, but maybe later. In this case, a CHANGES.txt entry should be added. There is another tuning opportunity, which I didn't take. We are marshaling out the flags from the enums into member variables. I'm not sure how efficient the storage of a boolean vs an enum is. If it is a wash, then having an enum value as replacement would be a good thing. It sould clearly document what controls the flag. This is currently not possibible because of backwards compatibility, because the fields are protected and not deprecated in 2.9. I think with your change we are fine. The only complication would be the set/get for some of the flags. (E.g. AbstractField.setOmitNorms.) What's with that? Are the enum values merely a hint??? Does it make sense to allow omitNorms to be changed after an AbstractField is being used? It is perfectly legal to change these constants after creating the field, so the setters must be there.
          Hide
          DM Smith added a comment - - edited

          I only added the license header back in the Version class. It must be there.

          Sorry about wacking the license on Version. It must have been an accident. I know it needs to be there.

          Some fine tuning: You defined package protected abstract methods, but made them public in the enum constant. Changed to all-public. This was also a backwards-break in contrib/queryParser.

          Thanks. Inadvertently, I was following the pattern for an Interface, where scoping does not matter.

          So it works, but not with switch statements.

          IMHO: Having a switch statement (or cascading if-then-else) over the collection of values is generally indicative of a bad design (or an opportunity for an improved design By adding methods to each enum that return literals, we can eliminate this and at the same time, improve performance.

          There is another tuning opportunity, which I didn't take. We are marshaling out the flags from the enums into member variables. I'm not sure how efficient the storage of a boolean vs an enum is. If it is a wash, then having an enum value as replacement would be a good thing. It sould clearly document what controls the flag.

          The only complication would be the set/get for some of the flags. (E.g. AbstractField.setOmitNorms.) What's with that? Are the enum values merely a hint??? Does it make sense to allow omitNorms to be changed after an AbstractField is being used?

          Show
          DM Smith added a comment - - edited I only added the license header back in the Version class. It must be there. Sorry about wacking the license on Version. It must have been an accident. I know it needs to be there. Some fine tuning: You defined package protected abstract methods, but made them public in the enum constant. Changed to all-public. This was also a backwards-break in contrib/queryParser. Thanks. Inadvertently, I was following the pattern for an Interface, where scoping does not matter. So it works, but not with switch statements. IMHO: Having a switch statement (or cascading if-then-else) over the collection of values is generally indicative of a bad design (or an opportunity for an improved design By adding methods to each enum that return literals, we can eliminate this and at the same time, improve performance. There is another tuning opportunity, which I didn't take. We are marshaling out the flags from the enums into member variables. I'm not sure how efficient the storage of a boolean vs an enum is. If it is a wash, then having an enum value as replacement would be a good thing. It sould clearly document what controls the flag. The only complication would be the set/get for some of the flags. (E.g. AbstractField.setOmitNorms.) What's with that? Are the enum values merely a hint??? Does it make sense to allow omitNorms to be changed after an AbstractField is being used?
          Hide
          Uwe Schindler added a comment -

          Updated patch (merged with StandardAnalyzer version constants). Also added Lucene version 3.0 for completeness to enable users to build apps and do not need to use the CURRENT constant.

          Show
          Uwe Schindler added a comment - Updated patch (merged with StandardAnalyzer version constants). Also added Lucene version 3.0 for completeness to enable users to build apps and do not need to use the CURRENT constant.
          Hide
          Uwe Schindler added a comment -

          Some samll problem that may appear in future: We had renamed some enum constants in 2.9 (TOKENIZED -> ANALYZED). No problems now, because deprec constants removed.

          If we want to do the same in future, we can do it the same way, but need to do a hack (because it is not officially supprted by Java 5):
          http://forums.sun.com/thread.jspa?threadID=5137742

          So it works, but not with switch statements. Just as a comment. But in my opinion, renaming enum constants is a bad thing...

          Show
          Uwe Schindler added a comment - Some samll problem that may appear in future: We had renamed some enum constants in 2.9 (TOKENIZED -> ANALYZED). No problems now, because deprec constants removed. If we want to do the same in future, we can do it the same way, but need to do a hack (because it is not officially supprted by Java 5): http://forums.sun.com/thread.jspa?threadID=5137742 So it works, but not with switch statements. Just as a comment. But in my opinion, renaming enum constants is a bad thing...
          Hide
          Uwe Schindler added a comment -

          Some fine tuning: You defined package protected abstract methods, but made them public in the enum constant. Changed to all-public. This was also a backwards-break in contrib/queryParser.

          I think this is ready to commit.

          Show
          Uwe Schindler added a comment - Some fine tuning: You defined package protected abstract methods, but made them public in the enum constant. Changed to all-public. This was also a backwards-break in contrib/queryParser. I think this is ready to commit.
          Hide
          Uwe Schindler added a comment -

          Patch with license header restored.

          Show
          Uwe Schindler added a comment - Patch with license header restored.
          Hide
          Uwe Schindler added a comment -

          I tested it here, we have no backwards problem (at least with "normal" usage). The dynamic linker of Java when running old Java 1.4 code against the new enum classes has no problem with the replaced superclass: Old code compiled against Field.Store.XXX against lucene-core-2.9.jar with superclass Parameter works perfectly with the new lucene-core-3.0.jar. This works because we only use the parameter class as a type safe enumeration an did not call any methods (only maybe toString()) of it. So the linker has no problem.

          I would simply apply this ptach to trunk. I would also remove the Parameter class completely, as it breaks no code (only if somebody has used that class for own enums). Maybe we should deprecate Parameter in 2.9.1 and say that it will be removed in 3.0 as this version uses Java5's enum. But it also does not hurt if we keep it and mark it deprecated as in the patch.

          To your patch: I only added the license header back in the Version class. It must be there.

          Show
          Uwe Schindler added a comment - I tested it here, we have no backwards problem (at least with "normal" usage). The dynamic linker of Java when running old Java 1.4 code against the new enum classes has no problem with the replaced superclass: Old code compiled against Field.Store.XXX against lucene-core-2.9.jar with superclass Parameter works perfectly with the new lucene-core-3.0.jar. This works because we only use the parameter class as a type safe enumeration an did not call any methods (only maybe toString()) of it. So the linker has no problem. I would simply apply this ptach to trunk. I would also remove the Parameter class completely, as it breaks no code (only if somebody has used that class for own enums). Maybe we should deprecate Parameter in 2.9.1 and say that it will be removed in 3.0 as this version uses Java5's enum. But it also does not hurt if we keep it and mark it deprecated as in the patch. To your patch: I only added the license header back in the Version class. It must be there.
          Hide
          DM Smith added a comment -

          This issue and patch were part of LUCENE-1257, but may have backward compatibility issues. (I'll remove the patch from there.)

          Show
          DM Smith added a comment - This issue and patch were part of LUCENE-1257 , but may have backward compatibility issues. (I'll remove the patch from there.)

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              DM Smith
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development