Details

    • Lucene Fields:
      New

      Description

      This came up from dicussions on IRC. I'm summarizing here...

      Today when you make a Field to add to a document you can set things
      index or not, stored or not, analyzed or not, details like omitTfAP,
      omitNorms, index term vectors (separately controlling
      offsets/positions), etc.

      I think we should factor these out into a new class (FieldType?).
      Then you could re-use this FieldType instance across multiple fields.

      The Field instance would still hold the actual value.

      We could then do per-field analyzers by adding a setAnalyzer on the
      FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
      for per-field codecs (with flex), where we now have
      PerFieldCodecWrapper).

      This would NOT be a schema! It's just refactoring what we already
      specify today. EG it's not serialized into the index.

      This has been discussed before, and I know Michael Busch opened a more
      ambitious (I think?) issue. I think this is a good first baby step. We could
      consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
      off on that for starters...

      1. LUCENE-2308.patch
        11 kB
        Chris Male
      2. LUCENE-2308.patch
        21 kB
        Nikola Tankovic
      3. LUCENE-2308-2.patch
        59 kB
        Nikola Tankovic
      4. LUCENE-2308-3.patch
        88 kB
        Nikola Tankovic
      5. LUCENE-2308-4.patch
        52 kB
        Nikola Tankovic
      6. LUCENE-2308-5.patch
        85 kB
        Nikola Tankovic
      7. LUCENE-2308-6.patch
        229 kB
        Nikola Tankovic
      8. LUCENE-2308-7.patch
        584 kB
        Nikola Tankovic
      9. LUCENE-2308-8.patch
        579 kB
        Nikola Tankovic
      10. LUCENE-2308-9.patch
        574 kB
        Nikola Tankovic
      11. LUCENE-2308-ltc.patch
        4 kB
        Michael McCandless
      12. LUCENE-2308-10.patch
        64 kB
        Nikola Tankovic
      13. LUCENE-2308-11.patch
        76 kB
        Nikola Tankovic
      14. LUCENE-2308-12.patch
        1 kB
        Nikola Tankovic
      15. LUCENE-2308-13.patch
        30 kB
        Nikola Tankovic
      16. LUCENE-2308-14.patch
        4 kB
        Nikola Tankovic
      17. LUCENE-2308-15.patch
        116 kB
        Nikola Tankovic
      18. LUCENE-2308-16.patch
        163 kB
        Nikola Tankovic
      19. LUCENE-2308-17.patch
        167 kB
        Nikola Tankovic
      20. LUCENE-2308-18.patch
        134 kB
        Nikola Tankovic
      21. LUCENE-2308-19.patch
        222 kB
        Nikola Tankovic
      22. LUCENE-2308-20.patch
        554 kB
        Nikola Tankovic
      23. LUCENE-2308-21.patch
        66 kB
        Nikola Tankovic
      24. LUCENE-2308.branchdiffs
        1.20 MB
        Michael McCandless
      25. LUCENE-2308.branchdiffs.moved
        1.02 MB
        Michael McCandless
      26. LUCENE-2308-merge-1.patch
        85 kB
        Nikola Tankovic
      27. LUCENE-2308-merge-2.patch
        152 kB
        Nikola Tankovic
      28. LUCENE-2308-merge-3.patch
        29 kB
        Nikola Tankovic
      29. LUCENE-2308.patch
        44 kB
        Michael McCandless
      30. LUCENE-2308.patch
        90 kB
        Michael McCandless
      31. LUCENE-2308-branch.patch
        1.05 MB
        Michael McCandless
      32. LUCENE-2308-final.patch
        1.07 MB
        Michael McCandless
      33. LUCENE-2308.patch
        15 kB
        Michael McCandless
      34. LUCENE-2308-FT-interface.patch
        205 kB
        Chris Male
      35. LUCENE-2308-FT-interface.patch
        204 kB
        Chris Male
      36. LUCENE-2308-FT-interface.patch
        59 kB
        Chris Male
      37. LUCENE-2308-FT-interface.patch
        60 kB
        Chris Male

        Issue Links

          Activity

          Hide
          Chris Male added a comment -

          Hi Mike,

          +1 to this idea.

          Do you envisage FieldType instances being immutable or would you be able to change the Analyzer on a FieldType? If they are mutable, would you see FieldType instances being shared across multiple Fields? Or would each Field have its own FieldType instance?

          Show
          Chris Male added a comment - Hi Mike, +1 to this idea. Do you envisage FieldType instances being immutable or would you be able to change the Analyzer on a FieldType? If they are mutable, would you see FieldType instances being shared across multiple Fields? Or would each Field have its own FieldType instance?
          Hide
          Michael McCandless added a comment -

          I think immutable & shareable across Field instances for sure and presumably also across different fields?

          And maybe we should have some hierarchy, eg analyzed or not.

          I think it's important that we contain this to the baby steps (eg not ambitiously make a huge type hierarchy) – it really is just pulling out the "type-like" configuration from Field, leaving just the actual value of the field on Field.

          Show
          Michael McCandless added a comment - I think immutable & shareable across Field instances for sure and presumably also across different fields? And maybe we should have some hierarchy, eg analyzed or not. I think it's important that we contain this to the baby steps (eg not ambitiously make a huge type hierarchy) – it really is just pulling out the "type-like" configuration from Field, leaving just the actual value of the field on Field.
          Hide
          Chris Male added a comment -

          Yeah I agree with the immutability and shareability.

          I'll give this a shot, with taking the babiest of baby steps.

          Show
          Chris Male added a comment - Yeah I agree with the immutability and shareability. I'll give this a shot, with taking the babiest of baby steps.
          Hide
          Robert Muir added a comment -

          details like omitTfAP, omitNorms

          personal pet peeve, i wonder if we could consider improving on 'omit' here,
          I think things like omit(false), disable(false) are a little awkward.

          Show
          Robert Muir added a comment - details like omitTfAP, omitNorms personal pet peeve, i wonder if we could consider improving on 'omit' here, I think things like omit(false), disable(false) are a little awkward.
          Hide
          Chris Male added a comment -

          So you are thinking more along the lines indexNorms(true|false)?

          Show
          Chris Male added a comment - So you are thinking more along the lines indexNorms(true|false)?
          Hide
          Robert Muir added a comment -

          So you are thinking more along the lines indexNorms(true|false)?

          or whatever you come up with, that doesn't create double-negatives!
          but yeah, i think something like that is a little easier... no big deal
          just figured I would bring it up if this stuff was getting refactored anyway

          Show
          Robert Muir added a comment - So you are thinking more along the lines indexNorms(true|false)? or whatever you come up with, that doesn't create double-negatives! but yeah, i think something like that is a little easier... no big deal just figured I would bring it up if this stuff was getting refactored anyway
          Hide
          Chris Male added a comment -

          I agree entirely. This is definitely the moment to remove any ambiguity or confusion in this API. I'll make sure to incorporate this idea.

          Show
          Chris Male added a comment - I agree entirely. This is definitely the moment to remove any ambiguity or confusion in this API. I'll make sure to incorporate this idea.
          Hide
          Marvin Humphrey added a comment -

          I think we might consider matchOnly() instead of omitNorms(). If a field is
          "match only", we don't need boost bytes a.k.a. "norms" because they are only
          used as a scoring multiplier.

          Haven't got a good synonym for "omitTFAP", but I'd sure like one.

          Show
          Marvin Humphrey added a comment - I think we might consider matchOnly() instead of omitNorms(). If a field is "match only", we don't need boost bytes a.k.a. "norms" because they are only used as a scoring multiplier. Haven't got a good synonym for "omitTFAP", but I'd sure like one.
          Hide
          Shai Erera added a comment -

          How about enable(TYPE/FEATURE) and corresponding disable? So Type/Feature will have NORMS, TF, POSITIONS and calls would look like:
          f.enable(Type.NORMS), f.disable(Type.TF)?

          Show
          Shai Erera added a comment - How about enable(TYPE/FEATURE) and corresponding disable? So Type/Feature will have NORMS, TF, POSITIONS and calls would look like: f.enable(Type.NORMS), f.disable(Type.TF)?
          Hide
          Robert Muir added a comment -

          Just also to mention (probably too much for this one issue)!

          I think it would be nice of OmitTF was separately selectable
          from OmitPositions, as Shai implied. We would have to
          actually implement this though I think!

          Show
          Robert Muir added a comment - Just also to mention (probably too much for this one issue)! I think it would be nice of OmitTF was separately selectable from OmitPositions, as Shai implied. We would have to actually implement this though I think!
          Hide
          Marvin Humphrey added a comment -

          If you disable term freq, you also have to disable positions. The "freq"
          tells you how many positions there are.

          I think it's asking an awful lot of our users to require that they understand
          all the implications of posting format modifications when committers
          have difficulty mastering all the subtleties.

          Show
          Marvin Humphrey added a comment - If you disable term freq, you also have to disable positions. The "freq" tells you how many positions there are. I think it's asking an awful lot of our users to require that they understand all the implications of posting format modifications when committers have difficulty mastering all the subtleties.
          Hide
          Robert Muir added a comment -

          If you disable term freq, you also have to disable positions. The "freq"
          tells you how many positions there are.

          Marvin: as stated, we would have to actually implement this.
          There's an issue open for it too: LUCENE-2048.
          I was just discussing this with someone the other day.

          I think it's asking an awful lot of our users to require that they understand
          all the implications of posting format modifications when committers
          have difficulty mastering all the subtleties.

          I don't know what I did to piss you off, but I just thought it would be nice
          for completeness, to mention that this feature is still open and its
          something we should think about.

          Show
          Robert Muir added a comment - If you disable term freq, you also have to disable positions. The "freq" tells you how many positions there are. Marvin: as stated, we would have to actually implement this. There's an issue open for it too: LUCENE-2048 . I was just discussing this with someone the other day. I think it's asking an awful lot of our users to require that they understand all the implications of posting format modifications when committers have difficulty mastering all the subtleties. I don't know what I did to piss you off, but I just thought it would be nice for completeness, to mention that this feature is still open and its something we should think about.
          Hide
          Marvin Humphrey added a comment -

          I'm simply suggesting that the proposed API is too hard to understand.

          Most users know whether their fields can be "match-only" but have no idea what
          TFAP is. And even advanced users will have difficulty understanding all the
          implications for matching and scoring when they selectively disable portions
          of the posting format.

          I'm not a fan of omitTFAP, omitTF, omitNorms, omitPositions, or omit(flags).
          Something that ordinary users can grok would be used more often and more
          effectively.

          Show
          Marvin Humphrey added a comment - I'm simply suggesting that the proposed API is too hard to understand. Most users know whether their fields can be "match-only" but have no idea what TFAP is. And even advanced users will have difficulty understanding all the implications for matching and scoring when they selectively disable portions of the posting format. I'm not a fan of omitTFAP, omitTF, omitNorms, omitPositions, or omit(flags). Something that ordinary users can grok would be used more often and more effectively.
          Hide
          Chris Male added a comment -

          What I covered with Mike earlier was whether FieldType methods would be immutable or not.

          If they are, which seems a good idea, then everything will be enabled/disabled in the construction of the FieldType so we would only need to support property getter methods.

          Show
          Chris Male added a comment - What I covered with Mike earlier was whether FieldType methods would be immutable or not. If they are, which seems a good idea, then everything will be enabled/disabled in the construction of the FieldType so we would only need to support property getter methods.
          Hide
          Michael McCandless added a comment -

          Hmm one challenge with making FieldType immutable is.... we don't want
          a zillion ctors over time. Also creating a FieldType with args like
          new FieldType(true, false, false) isn't really readable.

          It would be nice if we could do something similar to IndexWriterConfig
          (LUCENE-2294), where you use incremental ctor/setters to set up the
          configuration but then once it's used ("bound" to a Field), it's
          immutable.

          I'm torn on naming: yes, search-oriented names like "matchOnly" is
          tempting, but then we really should tease apart termFreq and positions
          (they are stuck together now with omitTFAP). And the two are not
          fully independent as Marvin noted – so maybe we use a cryptic enum
          (DOCS, DOCS_TERM_FREQ, DOCS_TERM_FREQ_POSITIONS)? If we can only find
          better names...

          I'm not sure we can/should find better index-time names. What is
          stored in the index is relatively independent from how/whether
          searches make use of it. EG if you store termFreq (but not positions)
          you can still do match only searching, or, you can do full scoring of
          the query. You can't use positional queries.

          Show
          Michael McCandless added a comment - Hmm one challenge with making FieldType immutable is.... we don't want a zillion ctors over time. Also creating a FieldType with args like new FieldType(true, false, false) isn't really readable. It would be nice if we could do something similar to IndexWriterConfig ( LUCENE-2294 ), where you use incremental ctor/setters to set up the configuration but then once it's used ("bound" to a Field), it's immutable. I'm torn on naming: yes, search-oriented names like "matchOnly" is tempting, but then we really should tease apart termFreq and positions (they are stuck together now with omitTFAP). And the two are not fully independent as Marvin noted – so maybe we use a cryptic enum (DOCS, DOCS_TERM_FREQ, DOCS_TERM_FREQ_POSITIONS)? If we can only find better names... I'm not sure we can/should find better index-time names. What is stored in the index is relatively independent from how/whether searches make use of it. EG if you store termFreq (but not positions) you can still do match only searching, or, you can do full scoring of the query. You can't use positional queries.
          Hide
          Marvin Humphrey added a comment -

          > Also creating a FieldType with args like
          > new FieldType(true, false, false) isn't really readable.

          Agreed Another option would be a "flags" integer and bitwise constants:

          FieldType type = new FieldType(analyzer, FieldType.INDEXED | FieldType.STORED);
          

          > It would be nice if we could do something similar to IndexWriterConfig
          > (LUCENE-2294), where you use incremental ctor/setters to set up the
          > configuration but then once it's used ("bound" to a Field), it's
          > immutable.

          I bet that'll be more popular than flags, but I thought it was worth
          bringing it up anyway.

          Show
          Marvin Humphrey added a comment - > Also creating a FieldType with args like > new FieldType(true, false, false) isn't really readable. Agreed Another option would be a "flags" integer and bitwise constants: FieldType type = new FieldType(analyzer, FieldType.INDEXED | FieldType.STORED); > It would be nice if we could do something similar to IndexWriterConfig > ( LUCENE-2294 ), where you use incremental ctor/setters to set up the > configuration but then once it's used ("bound" to a Field), it's > immutable. I bet that'll be more popular than flags, but I thought it was worth bringing it up anyway.
          Hide
          Earwin Burrfoot added a comment -

          I'm strongly against names like 'matchOnly'. They are perfectly fine in some 'schema' layer over Lucene, but here, in lowlevel guts, I'd prefer names that clearly state what the hell do they do, without forcing me to consult javadocs/code.

          Show
          Earwin Burrfoot added a comment - I'm strongly against names like 'matchOnly'. They are perfectly fine in some 'schema' layer over Lucene, but here, in lowlevel guts, I'd prefer names that clearly state what the hell do they do, without forcing me to consult javadocs/code.
          Hide
          Yonik Seeley added a comment -

          For the non-expert user, it's just a label and won't have much meaning regardless of what it's called, and they will need to consult the docs. Of course, if one starts to dig deeper, "norms" actually does have a physical meaning in the index, so preferring a label with "norms" in it seems completely reasonable.

          There's also history to consider - when you change the name of something, you cut the link to the past in search engines, and in the memories of many developers.

          As it relates to Solr - I don't care so much since it makes sense for the Solr schema to isolate these changes and stick with "omitNorms" regardless.

          Show
          Yonik Seeley added a comment - For the non-expert user, it's just a label and won't have much meaning regardless of what it's called, and they will need to consult the docs. Of course, if one starts to dig deeper, "norms" actually does have a physical meaning in the index, so preferring a label with "norms" in it seems completely reasonable. There's also history to consider - when you change the name of something, you cut the link to the past in search engines, and in the memories of many developers. As it relates to Solr - I don't care so much since it makes sense for the Solr schema to isolate these changes and stick with "omitNorms" regardless.
          Hide
          Chris Male added a comment -

          It would be nice if we could do something similar to IndexWriterConfig
          (LUCENE-2294), where you use incremental ctor/setters to set up the
          configuration but then once it's used ("bound" to a Field), it's
          immutable.

          Yeah we could use something like a FieldTypeBuilder which could provide a fluid interface for specifying each property, which then get built into an immutable FieldType at the end.

          Show
          Chris Male added a comment - It would be nice if we could do something similar to IndexWriterConfig ( LUCENE-2294 ), where you use incremental ctor/setters to set up the configuration but then once it's used ("bound" to a Field), it's immutable. Yeah we could use something like a FieldTypeBuilder which could provide a fluid interface for specifying each property, which then get built into an immutable FieldType at the end.
          Hide
          Yonik Seeley added a comment -

          I'm not sure if strict immutability is necessary - there's everything in between too.
          One can simply say that all changes should be made before first use, and after that point it's undefined.

          Unrelated question: I assume that this would retain the same flexibility as we have today... the ability to change FieldType for field "foo" from one document to the next?

          Show
          Yonik Seeley added a comment - I'm not sure if strict immutability is necessary - there's everything in between too. One can simply say that all changes should be made before first use, and after that point it's undefined. Unrelated question: I assume that this would retain the same flexibility as we have today... the ability to change FieldType for field "foo" from one document to the next?
          Hide
          Chris Male added a comment - - edited

          I'm not sure if strict immutability is necessary - there's everything in between too.
          One can simply say that all changes should be made before first use, and after that point it's undefined.

          I'm really unsure about this if people are going to be using a FieldType instance with multiple Fields. Perhaps this really is just an edge case though.

          Unrelated question: I assume that this would retain the same flexibility as we have today... the ability to change FieldType for field "foo" from one document to the next?

          Are you wanting to be able to reuse the same Field instance in both documents while defining separate FieldTypes? Or is creating new Field instances okay?

          Show
          Chris Male added a comment - - edited I'm not sure if strict immutability is necessary - there's everything in between too. One can simply say that all changes should be made before first use, and after that point it's undefined. I'm really unsure about this if people are going to be using a FieldType instance with multiple Fields. Perhaps this really is just an edge case though. Unrelated question: I assume that this would retain the same flexibility as we have today... the ability to change FieldType for field "foo" from one document to the next? Are you wanting to be able to reuse the same Field instance in both documents while defining separate FieldTypes? Or is creating new Field instances okay?
          Hide
          Yonik Seeley added a comment -

          I'm really unsure about this if people are going to be using a FieldType instance with multiple Fields.

          I will, if I can (provided the FieldType does not contain the field name). That shouldn't have anything to do with immutability though.

          Are you wanting to be able to reuse the same Field instance in both documents while defining separate FieldTypes? Or is creating new Field instances okay?

          new Field instances should be fine - it's not really my use case anyway. But we're designing for the 1000's of use cases that are out there and we should be careful about adding new constraints.

          Show
          Yonik Seeley added a comment - I'm really unsure about this if people are going to be using a FieldType instance with multiple Fields. I will, if I can (provided the FieldType does not contain the field name). That shouldn't have anything to do with immutability though. Are you wanting to be able to reuse the same Field instance in both documents while defining separate FieldTypes? Or is creating new Field instances okay? new Field instances should be fine - it's not really my use case anyway. But we're designing for the 1000's of use cases that are out there and we should be careful about adding new constraints.
          Hide
          Chris Male added a comment -

          I will, if I can (provided the FieldType does not contain the field name). That shouldn't have anything to do with immutability though.

          Yeah the field name will stay inside the Field. To me the reuse issue relates immutability in that a change to a property in one FieldType after construction means the change effects all the Fields that use that type.

          But as you say, if we document that its best to set everything at instantiation and that whatever happens after that is undefined, then I imagine it'll be fine.

          new Field instances should be fine - it's not really my use case anyway. But we're designing for the 1000's of use cases that are out there and we should be careful about adding new constraints.

          Yeah I appreciate that this API will be used in lots of different ways. Baby steps as Mike said But to answer your question, yes the flexibility will remain.

          Show
          Chris Male added a comment - I will, if I can (provided the FieldType does not contain the field name). That shouldn't have anything to do with immutability though. Yeah the field name will stay inside the Field. To me the reuse issue relates immutability in that a change to a property in one FieldType after construction means the change effects all the Fields that use that type. But as you say, if we document that its best to set everything at instantiation and that whatever happens after that is undefined, then I imagine it'll be fine. new Field instances should be fine - it's not really my use case anyway. But we're designing for the 1000's of use cases that are out there and we should be careful about adding new constraints. Yeah I appreciate that this API will be used in lots of different ways. Baby steps as Mike said But to answer your question, yes the flexibility will remain.
          Hide
          Yonik Seeley added a comment -

          Of course... given that Fieldable is an interface, one could create an implementation that just delegated all the calls like omitNorms to a shared instance, except for the value part. Add a getAnalyzer() method to Fieldable, and it's the same thing in the end?

          Show
          Yonik Seeley added a comment - Of course... given that Fieldable is an interface, one could create an implementation that just delegated all the calls like omitNorms to a shared instance, except for the value part. Add a getAnalyzer() method to Fieldable, and it's the same thing in the end?
          Hide
          David Smiley added a comment -

          I'm surprised to barely even see a mention to Solr here which already, of course obviously, already has a FieldType. Might it be ported?

          Show
          David Smiley added a comment - I'm surprised to barely even see a mention to Solr here which already, of course obviously, already has a FieldType. Might it be ported?
          Hide
          Simon Willnauer added a comment -

          Brief Summary for GSoC Students:

          FieldType aims on the one hand to separate field properties from the
          actual value and on the other make Field's extensibility easier. Both
          seem equally important while far from easy to achieve. Fieldable and
          Field are a core API and changes to it need to well thought. Further
          this issue can easily cause drastic performance degradation if not
          done right. Consider this as a massive change since fields are used
          almost all over lucene and solr.

          Show
          Simon Willnauer added a comment - Brief Summary for GSoC Students: FieldType aims on the one hand to separate field properties from the actual value and on the other make Field's extensibility easier. Both seem equally important while far from easy to achieve. Fieldable and Field are a core API and changes to it need to well thought. Further this issue can easily cause drastic performance degradation if not done right. Consider this as a massive change since fields are used almost all over lucene and solr.
          Hide
          Simon Willnauer added a comment -

          I'm surprised to barely even see a mention to Solr here which already, of course obviously, already has a FieldType. Might it be ported?

          Moving stuff from Solr to Lucene involves lots of politics. It is way easier to let Solr adopt eventually than fight your way through the politics (this is my opinion though.)

          Show
          Simon Willnauer added a comment - I'm surprised to barely even see a mention to Solr here which already, of course obviously, already has a FieldType. Might it be ported? Moving stuff from Solr to Lucene involves lots of politics. It is way easier to let Solr adopt eventually than fight your way through the politics (this is my opinion though.)
          Hide
          Robert Muir added a comment -

          Moving stuff from Solr to Lucene involves lots of politics. It is way easier to let Solr adopt eventually than fight your way through the politics (this is my opinion though.)

          Then why do we still have merged codebases?
          If this is the way things are, then we should un-merge the two projects.

          because as a lucene developer, i spend a lot of time trying to do my part to fix various things in Solr... if its a one-way-street then we need to un-merge.

          Show
          Robert Muir added a comment - Moving stuff from Solr to Lucene involves lots of politics. It is way easier to let Solr adopt eventually than fight your way through the politics (this is my opinion though.) Then why do we still have merged codebases? If this is the way things are, then we should un-merge the two projects. because as a lucene developer, i spend a lot of time trying to do my part to fix various things in Solr... if its a one-way-street then we need to un-merge.
          Hide
          Chris Male added a comment -

          I'm surprised to barely even see a mention to Solr here which already, of course obviously, already has a FieldType. Might it be ported?

          I think there is a lot of overlap but Solr's FieldTypes also integrate with its schema through SchemaField so maybe its an option to port the overlap and then let Solr extend whatever is created, to provide its schema integration/Solr specific functions?

          Show
          Chris Male added a comment - I'm surprised to barely even see a mention to Solr here which already, of course obviously, already has a FieldType. Might it be ported? I think there is a lot of overlap but Solr's FieldTypes also integrate with its schema through SchemaField so maybe its an option to port the overlap and then let Solr extend whatever is created, to provide its schema integration/Solr specific functions?
          Hide
          Yonik Seeley added a comment -

          I think there is a lot of overlap but Solr's FieldTypes also integrate with its schema through SchemaField so maybe its an option to port the overlap and then let Solr extend whatever is created, to provide its schema integration/Solr specific functions?

          Yeah, that seems reasonable.

          Show
          Yonik Seeley added a comment - I think there is a lot of overlap but Solr's FieldTypes also integrate with its schema through SchemaField so maybe its an option to port the overlap and then let Solr extend whatever is created, to provide its schema integration/Solr specific functions? Yeah, that seems reasonable.
          Hide
          Simon Willnauer added a comment -

          I think there is a lot of overlap but Solr's FieldTypes also integrate with its schema through SchemaField so maybe its an option to port the overlap and then let Solr extend whatever is created, to provide its schema integration/Solr specific functions?

          +1

          Show
          Simon Willnauer added a comment - I think there is a lot of overlap but Solr's FieldTypes also integrate with its schema through SchemaField so maybe its an option to port the overlap and then let Solr extend whatever is created, to provide its schema integration/Solr specific functions? +1
          Hide
          Chris Male added a comment -

          Throwing my ideas into the ring. Currently just the implementation of FieldType (I'm waiting on LUCENE-2310 before attacking Field). I've tried to incorporate all the various ideas suggested here.

          • FieldType has two constructors, modeled along the same lines as what Marvin suggested.
          • Introduces a simplified Attribute concept (FieldTypeAttribute) that allows context specific extensions to FieldType to be added. An example of this would be a SpatialAttribute that provides more information about the spatial information stored in a certain field. AttributeSource and Attribute were not used since they really seem designed around the AttributeFactory idea, which seems unnecessary here, and that the content of Attributes could change often, rather than being readonly as I would prefer them here.
          • FieldTypeBuilder provides a fluid interface for creating FieldTypes, much like IndexWriterConfig.
          • FieldType is readonly.
          Show
          Chris Male added a comment - Throwing my ideas into the ring. Currently just the implementation of FieldType (I'm waiting on LUCENE-2310 before attacking Field). I've tried to incorporate all the various ideas suggested here. FieldType has two constructors, modeled along the same lines as what Marvin suggested. Introduces a simplified Attribute concept (FieldTypeAttribute) that allows context specific extensions to FieldType to be added. An example of this would be a SpatialAttribute that provides more information about the spatial information stored in a certain field. AttributeSource and Attribute were not used since they really seem designed around the AttributeFactory idea, which seems unnecessary here, and that the content of Attributes could change often, rather than being readonly as I would prefer them here. FieldTypeBuilder provides a fluid interface for creating FieldTypes, much like IndexWriterConfig. FieldType is readonly.
          Hide
          Nikola Tankovic added a comment -

          Hi folks,

          I wrote an GSoC proposal for this issue, but I am missing a mentor for this issue. Any volonters?

          Show
          Nikola Tankovic added a comment - Hi folks, I wrote an GSoC proposal for this issue, but I am missing a mentor for this issue. Any volonters?
          Hide
          Nikola Tankovic added a comment -

          I submitted first draft of my proposal (LUCENE-2308 Separately specify a field's type), hope you can see it and give me some further pointers if needed.
          Thank you!

          Show
          Nikola Tankovic added a comment - I submitted first draft of my proposal ( LUCENE-2308 Separately specify a field's type), hope you can see it and give me some further pointers if needed. Thank you!
          Hide
          Simon Willnauer added a comment -

          I wrote an GSoC proposal for this issue, but I am missing a mentor for this issue. Any volonters?

          don't worry we will find somebody to mentor!

          I submitted first draft of my proposal (LUCENE-2308 Separately specify a field's type), hope you can see it and give me some further pointers if needed.

          yep I can see it - looks good so far.

          Show
          Simon Willnauer added a comment - I wrote an GSoC proposal for this issue, but I am missing a mentor for this issue. Any volonters? don't worry we will find somebody to mentor! I submitted first draft of my proposal ( LUCENE-2308 Separately specify a field's type), hope you can see it and give me some further pointers if needed. yep I can see it - looks good so far.
          Hide
          Michael McCandless added a comment -

          Hi Nikola, I'd be happy to mentor for this issue! Your proposal looks great.

          Show
          Michael McCandless added a comment - Hi Nikola, I'd be happy to mentor for this issue! Your proposal looks great.
          Hide
          Nikola Tankovic added a comment -

          Thank you Michael! I'll make some further small adjustments and add you as my mentor

          Show
          Nikola Tankovic added a comment - Thank you Michael! I'll make some further small adjustments and add you as my mentor
          Hide
          Simon Willnauer added a comment -

          Hey Nikola,

          any progress on this issue already? I'd love to see some patches even a starter would be great.

          simon

          Show
          Simon Willnauer added a comment - Hey Nikola, any progress on this issue already? I'd love to see some patches even a starter would be great. simon
          Hide
          Nikola Tankovic added a comment -

          Yes, I'm continuing along the path Chris started with his first patch, mine is coming soon.

          Nikola

          Show
          Nikola Tankovic added a comment - Yes, I'm continuing along the path Chris started with his first patch, mine is coming soon. Nikola
          Hide
          Nikola Tankovic added a comment -

          Sorry guys to keep you waiting, I had some slight problems and tight schedule. I will now continue with regular patching.

          Proposed in patch:

          • Remove from Fieldable everything that has to do with FieldType
          • Introduce CoreField (could be just Field) instead of AbstractField with included field type

          I will continue with extending CoreField with StandardField (which is now Field) and NumericField (refactored) if this proves to be OK.

          Show
          Nikola Tankovic added a comment - Sorry guys to keep you waiting, I had some slight problems and tight schedule. I will now continue with regular patching. Proposed in patch: Remove from Fieldable everything that has to do with FieldType Introduce CoreField (could be just Field) instead of AbstractField with included field type I will continue with extending CoreField with StandardField (which is now Field) and NumericField (refactored) if this proves to be OK.
          Hide
          Michael McCandless added a comment -

          I've opened LUCENE-3177 (and linked to this issue), to strongly
          decouple what indexer needs when indexing documents/fields from what
          we do in this issue. Ie, LUCENE-3177 gives us more freedom here, I
          think, to create specific concrete FieldType hierarhy for creating
          documents.

          Show
          Michael McCandless added a comment - I've opened LUCENE-3177 (and linked to this issue), to strongly decouple what indexer needs when indexing documents/fields from what we do in this issue. Ie, LUCENE-3177 gives us more freedom here, I think, to create specific concrete FieldType hierarhy for creating documents.
          Hide
          Michael McCandless added a comment -

          Thanks for the patch Nikola!

          Note: when you submit patches that you intend to donate to Apache, you
          should remember to check the box that says "Grant license to ASF...",
          as long as you are the sole creator of that patch (and thus have the
          right to grant this patch the ASF). Patches that incorporate someone
          elses source code are more interesting because we have to ensure the
          license is compatible with Apache's, update our LICENSE/NOTICE, etc.

          Stepping back here... I think we should think a bit about the target
          end goal here and then work out the baby steps to get there?

          I think ideally once we are done here, it should be incredibly simple
          to create a document, something like this:

          Document d = new Document();
          d.add(new TextField(title));
          d.add(new StringField(id));
          d.add(new BinaryField(bytes));
          d.add(new NumericField(price));
          

          These classes each use a default FieldType under the hood:

          • TextField indexes, tokenizes, with norms and TFAP
          • StringField indexes untokenized and no norms, no TFAP (maybe)
          • BinaryField only stores the byte[]
          • NumericField does what it does today

          If an app wants to tweak the type, it can do so, something like this:

          FieldType titleFieldType = new FieldType(Textfield.DEFAULT_TYPE);
          titleFieldType.setOmitNorms(true);
          titleFieldType.setOmitTFAP(true);
          d.add(new Field(titleFieldType, title));
          

          Ie, the default *Field classes are sugar for binding to the common
          default type, but you can easily go and customize the type if you want
          to.

          Does that sound "roughly" like the goal here....?

          Show
          Michael McCandless added a comment - Thanks for the patch Nikola! Note: when you submit patches that you intend to donate to Apache, you should remember to check the box that says "Grant license to ASF...", as long as you are the sole creator of that patch (and thus have the right to grant this patch the ASF). Patches that incorporate someone elses source code are more interesting because we have to ensure the license is compatible with Apache's, update our LICENSE/NOTICE, etc. Stepping back here... I think we should think a bit about the target end goal here and then work out the baby steps to get there? I think ideally once we are done here, it should be incredibly simple to create a document, something like this: Document d = new Document(); d.add( new TextField(title)); d.add( new StringField(id)); d.add( new BinaryField(bytes)); d.add( new NumericField(price)); These classes each use a default FieldType under the hood: TextField indexes, tokenizes, with norms and TFAP StringField indexes untokenized and no norms, no TFAP (maybe) BinaryField only stores the byte[] NumericField does what it does today If an app wants to tweak the type, it can do so, something like this: FieldType titleFieldType = new FieldType(Textfield.DEFAULT_TYPE); titleFieldType.setOmitNorms( true ); titleFieldType.setOmitTFAP( true ); d.add( new Field(titleFieldType, title)); Ie, the default *Field classes are sugar for binding to the common default type, but you can easily go and customize the type if you want to. Does that sound "roughly" like the goal here....?
          Hide
          Nikola Tankovic added a comment -

          Mike, that's exactly what I needed, quick API goal summary. This looks very clean and nice to me. Next patch will continue in that direction.

          Basically, like I said, we should remove AbstractField and keep only Field (with Fieldable interface). Then extend Field with Text,String,Binary and Numeric.

          Show
          Nikola Tankovic added a comment - Mike, that's exactly what I needed, quick API goal summary. This looks very clean and nice to me. Next patch will continue in that direction. Basically, like I said, we should remove AbstractField and keep only Field (with Fieldable interface). Then extend Field with Text,String,Binary and Numeric.
          Hide
          Michael McCandless added a comment -

          I think we can create only Field? Ie, no Fieldable interface nor AbstractField?

          I think IndexableField (LUCENE-3177) is the only interface we need?

          Show
          Michael McCandless added a comment - I think we can create only Field? Ie, no Fieldable interface nor AbstractField? I think IndexableField ( LUCENE-3177 ) is the only interface we need?
          Hide
          Nikola Tankovic added a comment -

          Yes, IndexableField looks sufficient.

          Show
          Nikola Tankovic added a comment - Yes, IndexableField looks sufficient.
          Hide
          Michael McCandless added a comment -

          I created a new branch, where we can iterate on these interlinked issues:

            https://svn.apache.org/repos/asf/lucene/dev/branches/fieldtype
          

          And I committed the initial patch from LUCENE-3177, decoupling indexer from the doc/field impl.

          Show
          Michael McCandless added a comment - I created a new branch, where we can iterate on these interlinked issues: https://svn.apache.org/repos/asf/lucene/dev/branches/fieldtype And I committed the initial patch from LUCENE-3177 , decoupling indexer from the doc/field impl.
          Hide
          Nikola Tankovic added a comment - - edited

          New patch: a baby step towards dividing AbstractField and Field towards Field, TextField, StringField, NumericField and BinaryField with default FieldType's.

          Show
          Nikola Tankovic added a comment - - edited New patch: a baby step towards dividing AbstractField and Field towards Field, TextField, StringField, NumericField and BinaryField with default FieldType's.
          Hide
          Michael McCandless added a comment -

          Patch looks good, thanks Nikola!

          When you make the patch, can you run "svn diff" from the top-level
          dir? Ie, so that file paths look
          lucene/src/java/org/apache/lucene/document/Field.java

          A couple minor code-formatting things:

          • Please add { } around one-line ifs, eg in FieldType.toString
          • import lines go after the copyright (FieldType.java)
          • If possible please try to avoid adding "noise" to the patch, for
            example re-formatting javadocs (eg NumericField.java). It's fine
            to clean things up (add missing {}'s to existing code) as you go,
            but if it's simply a reformat that just adds noise which makes it
            harder to see real changes.

          Other stuff:

          • The DEFAULT_TYPE for each field can be final right?
          • For FieldType, can we use direct members of the class, instead of
            the EnumSet? (Ie, boolean indexed, boolean stored, etc.).

          The patch causes compilation errors when I run "ant compile-core", but
          that's expected right?

          I think our immediate goal here should be to get a compilable patch
          with tests passing, ie the "dirt path". Then we can go back and
          iterate.

          But, because so many tests rely on the current Document/Field API... I
          think in order to stage this we should make a totally new package,
          call it document2 for now, and create all these new classes inside
          there. Then, one by one we can cutover tests to use document2/*,
          starting with TestDemo. Eventually, once everything is cutover, we
          can remove document and rename document2 to document.

          Show
          Michael McCandless added a comment - Patch looks good, thanks Nikola! When you make the patch, can you run "svn diff" from the top-level dir? Ie, so that file paths look lucene/src/java/org/apache/lucene/document/Field.java A couple minor code-formatting things: Please add { } around one-line ifs, eg in FieldType.toString import lines go after the copyright (FieldType.java) If possible please try to avoid adding "noise" to the patch, for example re-formatting javadocs (eg NumericField.java). It's fine to clean things up (add missing {}'s to existing code) as you go, but if it's simply a reformat that just adds noise which makes it harder to see real changes. Other stuff: The DEFAULT_TYPE for each field can be final right? For FieldType, can we use direct members of the class, instead of the EnumSet? (Ie, boolean indexed, boolean stored, etc.). The patch causes compilation errors when I run "ant compile-core", but that's expected right? I think our immediate goal here should be to get a compilable patch with tests passing, ie the "dirt path". Then we can go back and iterate. But, because so many tests rely on the current Document/Field API... I think in order to stage this we should make a totally new package, call it document2 for now, and create all these new classes inside there. Then, one by one we can cutover tests to use document2/*, starting with TestDemo. Eventually, once everything is cutover, we can remove document and rename document2 to document.
          Hide
          Nikola Tankovic added a comment -

          Thanks Mike,

          everything sound good, I'll correct suggested things, then start with document2 package!

          Show
          Nikola Tankovic added a comment - Thanks Mike, everything sound good, I'll correct suggested things, then start with document2 package!
          Hide
          Nikola Tankovic added a comment -

          Patch: copied oal.doc to oal.doc2 with new FieldType changes, modified TestDocument Unit Test, but have some failures. I think it's because reference to oal.doc should change to oal.doc2 but I don't know exactly where so I need some help. I changed imports at IndexSearcher and IndexReader only.

          Show
          Nikola Tankovic added a comment - Patch: copied oal.doc to oal.doc2 with new FieldType changes, modified TestDocument Unit Test, but have some failures. I think it's because reference to oal.doc should change to oal.doc2 but I don't know exactly where so I need some help. I changed imports at IndexSearcher and IndexReader only.
          Hide
          Nikola Tankovic added a comment -

          Patch No.4: Passing TestDemo Unit test

          Show
          Nikola Tankovic added a comment - Patch No.4: Passing TestDemo Unit test
          Hide
          Michael McCandless added a comment -

          Great – everything compiles and all core tests pass ("ant test-core")
          with this patch, including TestDemo, which is the only test cutover so
          far to the new Field/Document API!

          Feedback:

          • For the new Document (oal.document2.Document), I think we should
            remove setBoost and require instead that apps should do this
            multiplication themselves into the field's boost.
          • Can you remove all methods from the new Document class, and add
            back only what we need for the tests, as we cutover? (Ie, vs
            copying everything from Document). We'll need to later iterate on
            this API, to fix problems w/ existing Document class. EG I don't
            like that lookup-by-name is secretly O(N) cost, that multi-valued
            fields are "awkward", and I'm not sure we should have "sugar"
            methods like String getField(String name)).
          • The TestDemo cutover already exposes a danger w/ the new API: it's
            modifying TextField.DEFAULT_TYPE. I think somehow we need to make
            these default types read-only? EG maybe each has a "frozen" bit,
            and we through IllegalStateExc if you try to change anything once
            it's frozen?

          I think the next step is to cutover more and more tests...

          Show
          Michael McCandless added a comment - Great – everything compiles and all core tests pass ("ant test-core") with this patch, including TestDemo, which is the only test cutover so far to the new Field/Document API! Feedback: For the new Document (oal.document2.Document), I think we should remove setBoost and require instead that apps should do this multiplication themselves into the field's boost. Can you remove all methods from the new Document class, and add back only what we need for the tests, as we cutover? (Ie, vs copying everything from Document). We'll need to later iterate on this API, to fix problems w/ existing Document class. EG I don't like that lookup-by-name is secretly O(N) cost, that multi-valued fields are "awkward", and I'm not sure we should have "sugar" methods like String getField(String name)). The TestDemo cutover already exposes a danger w/ the new API: it's modifying TextField.DEFAULT_TYPE. I think somehow we need to make these default types read-only? EG maybe each has a "frozen" bit, and we through IllegalStateExc if you try to change anything once it's frozen? I think the next step is to cutover more and more tests...
          Hide
          Nikola Tankovic added a comment -

          Some test are cutover, more to come...
          This fifth patch is to monitor progress, and see if something is wrong, or could be better.

          Cutover InstantiatedDocument along the way also

          Show
          Nikola Tankovic added a comment - Some test are cutover, more to come... This fifth patch is to monitor progress, and see if something is wrong, or could be better. Cutover InstantiatedDocument along the way also
          Hide
          Michael McCandless added a comment -

          Patch looks good Nikola – I'll commit it to the branch!

          I removed the 2 nocommits from oal.document2.Document – I think they were leftover from copying from Document.

          Show
          Michael McCandless added a comment - Patch looks good Nikola – I'll commit it to the branch! I removed the 2 nocommits from oal.document2.Document – I think they were leftover from copying from Document.
          Hide
          Nikola Tankovic added a comment -

          Minor fixes and more tests cutover.

          Show
          Nikola Tankovic added a comment - Minor fixes and more tests cutover.
          Hide
          Michael McCandless added a comment -

          I'm seeing compilation errors with the last patch, eg:

              [javac] /lucene/fieldtype/lucene/src/test/org/apache/lucene/index/TestSegmentMerger.java:53: setupDoc(org.apache.lucene.document2.Document) in org.apache.lucene.index.DocHelper cannot be applied to (org.apache.lucene.document.Document)
              [javac]     DocHelper.setupDoc(doc1);
              [javac]              ^
          

          Otherwise patch looks good!

          Show
          Michael McCandless added a comment - I'm seeing compilation errors with the last patch, eg: [javac] /lucene/fieldtype/lucene/src/test/org/apache/lucene/index/TestSegmentMerger.java:53: setupDoc(org.apache.lucene.document2.Document) in org.apache.lucene.index.DocHelper cannot be applied to (org.apache.lucene.document.Document) [javac] DocHelper.setupDoc(doc1); [javac] ^ Otherwise patch looks good!
          Hide
          Nikola Tankovic added a comment -

          Most of tests are now cutover, except those that require refactoring of IndexReader and Solr.

          Show
          Nikola Tankovic added a comment - Most of tests are now cutover, except those that require refactoring of IndexReader and Solr.
          Hide
          Nikola Tankovic added a comment -

          Some mistakes corrected in patch 8, so hopefully it is ready for commit.

          Show
          Nikola Tankovic added a comment - Some mistakes corrected in patch 8, so hopefully it is ready for commit.
          Hide
          Michael McCandless added a comment -

          Hmm I'm still hitting compilation errors here, eg:

              [javac] /lucene/fieldtype/modules/analysis/common/src/test/org/apache/lucene/analysis/shingle/ShingleFilterTest.java:59: cannot find symbol
              [javac] symbol  : variable TYPE_UNSTORED
              [javac] location: class org.apache.lucene.analysis.tokenattributes.TypeAttributeImpl
              [javac]         typeAtt.setType(TypeAttributeImpl.TYPE_UNSTORED);
          

          I think this is just another accidental replace of DEFAULT_TYPE... can you run a top-level "ant test" to make sure the patch compiles and passes tests? Also, the javadoc changes to Token.java should be reverted too.

          Show
          Michael McCandless added a comment - Hmm I'm still hitting compilation errors here, eg: [javac] /lucene/fieldtype/modules/analysis/common/src/test/org/apache/lucene/analysis/shingle/ShingleFilterTest.java:59: cannot find symbol [javac] symbol : variable TYPE_UNSTORED [javac] location: class org.apache.lucene.analysis.tokenattributes.TypeAttributeImpl [javac] typeAtt.setType(TypeAttributeImpl.TYPE_UNSTORED); I think this is just another accidental replace of DEFAULT_TYPE... can you run a top-level "ant test" to make sure the patch compiles and passes tests? Also, the javadoc changes to Token.java should be reverted too.
          Hide
          Nikola Tankovic added a comment -

          Patch 9: few more compilation errors fixed.

          Show
          Nikola Tankovic added a comment - Patch 9: few more compilation errors fixed.
          Hide
          Michael McCandless added a comment -

          Great! Tests pass and it looks like the accidental changes are gone. I'll commit to the branch...

          Show
          Michael McCandless added a comment - Great! Tests pass and it looks like the accidental changes are gone. I'll commit to the branch...
          Hide
          Michael McCandless added a comment -

          Small patch to fix LTC.newField to again randomly add in term vectors when they are disabled.

          Show
          Michael McCandless added a comment - Small patch to fix LTC.newField to again randomly add in term vectors when they are disabled.
          Hide
          Nikola Tankovic added a comment -

          Solr cutover to FieldType. Having repeated similar errors in tests. Trying to debug. Help is appriciated

          Show
          Nikola Tankovic added a comment - Solr cutover to FieldType. Having repeated similar errors in tests. Trying to debug. Help is appriciated
          Hide
          Michael McCandless added a comment -

          Nikola tracked this down – it's because we're not reading numeric field back properly from stored fields.

          Show
          Michael McCandless added a comment - Nikola tracked this down – it's because we're not reading numeric field back properly from stored fields.
          Hide
          Nikola Tankovic added a comment -

          Solr cutover, tests ok!

          Show
          Nikola Tankovic added a comment - Solr cutover, tests ok!
          Hide
          Michael McCandless added a comment -

          Patch looks good and tests pass!

          What remains to be cutover (index time)?

          I'll commit this patch to the branch...

          Show
          Michael McCandless added a comment - Patch looks good and tests pass! What remains to be cutover (index time)? I'll commit this patch to the branch...
          Hide
          Nikola Tankovic added a comment -

          Patch 12: Small cutover that's left

          Show
          Nikola Tankovic added a comment - Patch 12: Small cutover that's left
          Hide
          Nikola Tankovic added a comment -

          Patch 13: FieldsReader cutover /w tests

          Show
          Nikola Tankovic added a comment - Patch 13: FieldsReader cutover /w tests
          Hide
          Nikola Tankovic added a comment - - edited

          14th patch fixes lucene javadoc errors.

          Show
          Nikola Tankovic added a comment - - edited 14th patch fixes lucene javadoc errors.
          Hide
          Michael McCandless added a comment -

          javadocs patch looks good – all the Lucene warnings are fixed. I'll commit to the branch.

          Show
          Michael McCandless added a comment - javadocs patch looks good – all the Lucene warnings are fixed. I'll commit to the branch.
          Hide
          Nikola Tankovic added a comment -

          Patch 15 (untested): IR.document2 implemantation with Document2StoredFieldVisitor. More test cutover.

          Show
          Nikola Tankovic added a comment - Patch 15 (untested): IR.document2 implemantation with Document2StoredFieldVisitor. More test cutover.
          Hide
          Michael McCandless added a comment -

          This seems to fail w/ the patch: ant test -Dtestcase=BasicFunctionalityTest -Dtestmethod=testLazyField -Dtests.seed=-6444563330259805074:273924987296571588 but so far other failures!

          Show
          Michael McCandless added a comment - This seems to fail w/ the patch: ant test -Dtestcase=BasicFunctionalityTest -Dtestmethod=testLazyField -Dtests.seed=-6444563330259805074:273924987296571588 but so far other failures!
          Hide
          Michael McCandless added a comment -

          Patch looks good, except, you should not have to add {{IR.document2(int
          docID, StoredFieldVisitor visitor)}}, right? Ie, doesn't the existing
          document(int, StoredFieldVisitor) suffice? The StoredFieldVisitor is
          completely agnostic as to whether you're creating a new vs old
          Document.

          Show
          Michael McCandless added a comment - Patch looks good, except, you should not have to add {{IR.document2(int docID, StoredFieldVisitor visitor)}}, right? Ie, doesn't the existing document(int, StoredFieldVisitor) suffice? The StoredFieldVisitor is completely agnostic as to whether you're creating a new vs old Document.
          Hide
          Nikola Tankovic added a comment -

          Hmmm, I think I added it only 'cause of precaution, I think it is not needed.

          Show
          Nikola Tankovic added a comment - Hmmm, I think I added it only 'cause of precaution, I think it is not needed.
          Hide
          Nikola Tankovic added a comment -

          Patch 16: Solr new StoredFieldVisitor cutover + more tests. Please validate patch, as I have some strange errors in Solr. Ty!

          Show
          Nikola Tankovic added a comment - Patch 16: Solr new StoredFieldVisitor cutover + more tests. Please validate patch, as I have some strange errors in Solr. Ty!
          Hide
          Michael McCandless added a comment -

          Patch looks good – nice you just had to add IR.document2 and not touch the subclasses (since you're just using a different visitor).

          But, I see this failure: ant test -Dtestcase=HighlighterTest -Dtestmethod=testAlternateSummary -Dtests.seed=3152255204045715539:5957936590584148410

          Show
          Michael McCandless added a comment - Patch looks good – nice you just had to add IR.document2 and not touch the subclasses (since you're just using a different visitor). But, I see this failure: ant test -Dtestcase=HighlighterTest -Dtestmethod=testAlternateSummary -Dtests.seed=3152255204045715539:5957936590584148410
          Hide
          Nikola Tankovic added a comment -

          P.17 (non-tested): fixed suggested issues

          Show
          Nikola Tankovic added a comment - P.17 (non-tested): fixed suggested issues
          Hide
          Michael McCandless added a comment -

          Looks great Nikola! All tests pass. I'll commit this to the branch...

          Show
          Michael McCandless added a comment - Looks great Nikola! All tests pass. I'll commit this to the branch...
          Hide
          Nikola Tankovic added a comment -

          Getting some errors, maybe someone can help or check.

          Show
          Nikola Tankovic added a comment - Getting some errors, maybe someone can help or check.
          Hide
          Nikola Tankovic added a comment -

          Removed old oal.Document, except in documentation. Tests pass!

          Show
          Nikola Tankovic added a comment - Removed old oal.Document, except in documentation. Tests pass!
          Hide
          Michael McCandless added a comment -

          Patch looks good Nikola; I'll commit to the branch!

          I think the next step is to remove the oal.document package and any related classes (eg, DocumentStoredFieldVisitor), and then do a massive rename of doc/document2 back to doc/document?

          Show
          Michael McCandless added a comment - Patch looks good Nikola; I'll commit to the branch! I think the next step is to remove the oal.document package and any related classes (eg, DocumentStoredFieldVisitor), and then do a massive rename of doc/document2 back to doc/document?
          Hide
          Nikola Tankovic added a comment -

          Yes, exactly! That is my next step, shouldn't take long.

          Show
          Nikola Tankovic added a comment - Yes, exactly! That is my next step, shouldn't take long.
          Hide
          Nikola Tankovic added a comment -

          Removed old oal.document and replaced with new one completely. Now I need to fix javadoc errors.

          Show
          Nikola Tankovic added a comment - Removed old oal.document and replaced with new one completely. Now I need to fix javadoc errors.
          Hide
          Michael McCandless added a comment -

          Patch looks great... such a big milestone, to finally cutover entirely to the new API: awesome! I'll commit shortly to the branch...

          Show
          Michael McCandless added a comment - Patch looks great... such a big milestone, to finally cutover entirely to the new API: awesome! I'll commit shortly to the branch...
          Hide
          Nikola Tankovic added a comment -

          YES! Indeed a big milestone, finally So now merging back?

          Show
          Nikola Tankovic added a comment - YES! Indeed a big milestone, finally So now merging back?
          Hide
          Nikola Tankovic added a comment -

          21th patch Fixed Javadocs errors.

          Show
          Nikola Tankovic added a comment - 21th patch Fixed Javadocs errors.
          Hide
          Michael McCandless added a comment -

          Thanks Nikola.

          I'll have a look at this patch once I'm back online (on vacation this week).

          After that I'll work on merging this back to trunk! If anyone wants to review the new FieldType API before we merge back, now is a good time...

          Show
          Michael McCandless added a comment - Thanks Nikola. I'll have a look at this patch once I'm back online (on vacation this week). After that I'll work on merging this back to trunk! If anyone wants to review the new FieldType API before we merge back, now is a good time...
          Hide
          Michael McCandless added a comment -

          Patch looks good – I'll commit to the branch, and then work [slowly] towards making a committable (to trunk) final patch!

          Show
          Michael McCandless added a comment - Patch looks good – I'll commit to the branch, and then work [slowly] towards making a committable (to trunk) final patch!
          Hide
          Michael McCandless added a comment -

          This merge is going to be "challenging" so I'm going to attach several patches as I iterate towards making a real patch against current trunk.

          This first patch is the diff vs the trunk as of when I cut the branch (rev 1134018), ie it contains all changes committed to the branch. I generated this w/ dev-tools/scripts/diffSources.py, ie it should be applyable if you checkout that base rev.

          Next up I'll tweak the patch to account for moved files since we branched.

          Show
          Michael McCandless added a comment - This merge is going to be "challenging" so I'm going to attach several patches as I iterate towards making a real patch against current trunk. This first patch is the diff vs the trunk as of when I cut the branch (rev 1134018), ie it contains all changes committed to the branch. I generated this w/ dev-tools/scripts/diffSources.py, ie it should be applyable if you checkout that base rev. Next up I'll tweak the patch to account for moved files since we branched.
          Hide
          Michael McCandless added a comment -

          Next patch file, this time with the filenames in the patch moved to match svn moves that happened in trunk after the branch was cut.

          Show
          Michael McCandless added a comment - Next patch file, this time with the filenames in the patch moved to match svn moves that happened in trunk after the branch was cut.
          Hide
          Robert Muir added a comment - - edited

          I committed mike's patch as https://svn.apache.org/repos/asf/lucene/dev/branches/fieldtype_conflicted

          Conflicts remain: you can see them with:

          find . | xargs grep -l "<<<<<<<" | grep -v tmp
          

          Additionally there is a lot of new code since the branch was created (e.g. faceting module) that likely needs fixing too.

          Show
          Robert Muir added a comment - - edited I committed mike's patch as https://svn.apache.org/repos/asf/lucene/dev/branches/fieldtype_conflicted Conflicts remain: you can see them with: find . | xargs grep -l "<<<<<<<" | grep -v tmp Additionally there is a lot of new code since the branch was created (e.g. faceting module) that likely needs fixing too.
          Hide
          Michael McCandless added a comment -

          Awesome, thanks Robert!

          Nikola, can you switch to the fieldtype_conflicted branch and resolve the conflicts / fix the newly added code? If we do this quickly then hopefully trunk won't change much in the meantime

          Show
          Michael McCandless added a comment - Awesome, thanks Robert! Nikola, can you switch to the fieldtype_conflicted branch and resolve the conflicts / fix the newly added code? If we do this quickly then hopefully trunk won't change much in the meantime
          Hide
          Nikola Tankovic added a comment -

          Nice,

          I'll do my best to be as quick as I can, but probably we'll need one more quick iteration

          Show
          Nikola Tankovic added a comment - Nice, I'll do my best to be as quick as I can, but probably we'll need one more quick iteration
          Hide
          Nikola Tankovic added a comment -

          First merge patch, fixes almost every lucene/src/test. Mike can you merge stuff regarding FieldVisitor? And implement visit method in new IndexReader extended classes?

          Show
          Nikola Tankovic added a comment - First merge patch, fixes almost every lucene/src/test. Mike can you merge stuff regarding FieldVisitor? And implement visit method in new IndexReader extended classes?
          Hide
          Michael McCandless added a comment -

          Patch looks good – I'll commit, and I'll work on the FieldVisitor conflicts!

          Show
          Michael McCandless added a comment - Patch looks good – I'll commit, and I'll work on the FieldVisitor conflicts!
          Hide
          Michael McCandless added a comment -

          OK I resolved the cases of FieldSelector/Visitor that I could find – Nikola were there other cases I should look at? (I was confused by "And implement visit method in new IndexReader extended classes?").

          Show
          Michael McCandless added a comment - OK I resolved the cases of FieldSelector/Visitor that I could find – Nikola were there other cases I should look at? (I was confused by "And implement visit method in new IndexReader extended classes?").
          Hide
          Nikola Tankovic added a comment -

          Great Mike, you did it all!

          Show
          Nikola Tankovic added a comment - Great Mike, you did it all!
          Hide
          Nikola Tankovic added a comment -

          Further merging. Test pass, except for few wierd fails in Solr, please check.

          Next stop: fixing docs.

          Show
          Nikola Tankovic added a comment - Further merging. Test pass, except for few wierd fails in Solr, please check. Next stop: fixing docs.
          Hide
          Michael McCandless added a comment -

          Great progress! I'll commit shortly!

          But, strangely, I see TestStressNRT failing: spooky!

          Show
          Michael McCandless added a comment - Great progress! I'll commit shortly! But, strangely, I see TestStressNRT failing: spooky!
          Hide
          Nikola Tankovic added a comment -

          Javadocs fixed.

          Show
          Nikola Tankovic added a comment - Javadocs fixed.
          Hide
          Michael McCandless added a comment -

          Looks good Nikola – I'll commit to the branch. I'll also svn rm AbstractField.java.

          Show
          Michael McCandless added a comment - Looks good Nikola – I'll commit to the branch. I'll also svn rm AbstractField.java.
          Hide
          Michael McCandless added a comment -

          Patch, fixing all nocommits, adding direct test case for IndexableField, and removing dead code / making some names consistent.

          I still see failures in TestStressNRT, but besides that I think we are close to landing!!

          Show
          Michael McCandless added a comment - Patch, fixing all nocommits, adding direct test case for IndexableField, and removing dead code / making some names consistent. I still see failures in TestStressNRT, but besides that I think we are close to landing!!
          Hide
          Nikola Tankovic added a comment -

          That's strange 'cause on my machine all tests pass, or maybe this test
          skipped for some reason?

          Nikola

          Show
          Nikola Tankovic added a comment - That's strange 'cause on my machine all tests pass, or maybe this test skipped for some reason? Nikola
          Hide
          Michael McCandless added a comment -

          Well, the test is heavily threaded so it could be it only fails if you have enough cores?

          The failure is also intermittent, though often I see it fail after just one iteration...

          Show
          Michael McCandless added a comment - Well, the test is heavily threaded so it could be it only fails if you have enough cores? The failure is also intermittent, though often I see it fail after just one iteration...
          Hide
          Michael McCandless added a comment -

          Another merge patch – fixed a few more conflicts, some places where we lost the doc-level boost, simplified the Field/Document API some more, added some missing @Overrides, etc.

          Show
          Michael McCandless added a comment - Another merge patch – fixed a few more conflicts, some places where we lost the doc-level boost, simplified the Field/Document API some more, added some missing @Overrides, etc.
          Hide
          Michael McCandless added a comment -

          This is the svn diff vs the base of when we cut this branch, ie, has all changes in the branch, for reviewing.

          I think it's close to committable now!

          Show
          Michael McCandless added a comment - This is the svn diff vs the base of when we cut this branch, ie, has all changes in the branch, for reviewing. I think it's close to committable now!
          Hide
          Michael McCandless added a comment -

          Final [applyable] patch! I think it's ready to commit and I hope
          trunk doesn't up and change too much I plan to commit in a day or
          so...

          Show
          Michael McCandless added a comment - Final [applyable] patch! I think it's ready to commit and I hope trunk doesn't up and change too much I plan to commit in a day or so...
          Hide
          Nikola Tankovic added a comment -

          Congratulations, I think we did it

          Nikola

          Show
          Nikola Tankovic added a comment - Congratulations, I think we did it Nikola
          Hide
          Michael McCandless added a comment -

          Thank you for all this hard work Nikola!

          Show
          Michael McCandless added a comment - Thank you for all this hard work Nikola!
          Hide
          Nikola Tankovic added a comment -

          Well this GSOC was definitively a pleasure for me thank you all,
          but especially you Mike for your epic 24/7 support!

          Nikola

          Show
          Nikola Tankovic added a comment - Well this GSOC was definitively a pleasure for me thank you all, but especially you Mike for your epic 24/7 support! Nikola
          Hide
          Michael McCandless added a comment -

          OK I committed the final patch! I'll leave this open for a while in case we have small things to fix...

          Show
          Michael McCandless added a comment - OK I committed the final patch! I'll leave this open for a while in case we have small things to fix...
          Hide
          Chris Male added a comment -

          Yay! Great work Nikola, Mike.

          Show
          Chris Male added a comment - Yay! Great work Nikola, Mike.
          Hide
          Robert Muir added a comment -

          Can we add some stuff to MIGRATE.txt for this issue?

          Show
          Robert Muir added a comment - Can we add some stuff to MIGRATE.txt for this issue?
          Hide
          Michael McCandless added a comment -

          Can we add some stuff to MIGRATE.txt for this issue?

          Woops, yes, I will!

          Show
          Michael McCandless added a comment - Can we add some stuff to MIGRATE.txt for this issue? Woops, yes, I will!
          Hide
          Michael McCandless added a comment -

          Patch, fixing a few small things: added missing ft.freeze() calls; added MIGRATE.txt entry; added jdocs for the XXXField classes; removing *Lazy from FieldType.

          Show
          Michael McCandless added a comment - Patch, fixing a few small things: added missing ft.freeze() calls; added MIGRATE.txt entry; added jdocs for the XXXField classes; removing *Lazy from FieldType.
          Hide
          Robert Muir added a comment -

          Thanks Mike! This is really helpful!

          While reviewing this/merging the flexscoring branch, I had a few ideas of improvements:

          • I think FT should be immutable? Personally, I don't think we should enforce "patterns" like Freezable or Builder at this low-level, instead I think FieldType should be a simple immutable class with a single ctor that takes the minimal stuff that we (core lucene) need. It can still be concrete, but then you have to specify everything. Then, things like TextField/StringField are sugar APIs for common configurations. I don't like the idea of mutable FieldTypes that are reused across different fields because I am concerned that somehow the 'wrong configuration' will be applied accidentally.
          • Along these lines, we can then remove the "copy constructor", which also seems unnatural to java users, since FieldType would then be immutable there is no reason to ever copy it.
          • I think BinaryField should be able to index as binary? This is a new feature in Lucene 4 but its unfortunately really hard to do: there are a few approaches, but they are all difficult: custom tokenstream/AttributeImpls/AttributeFactory etc.
          • In the future, a BinaryField like this could be a base impl for CollationField: due to historical reasons we expose this capability as an Analyzer but I think this isn't great: its really an implementation detail. For example in Solr, its a real FieldType that uses an analyzer behind the scenes. So in this sense I think its more consistent with NumericField.
          Show
          Robert Muir added a comment - Thanks Mike! This is really helpful! While reviewing this/merging the flexscoring branch, I had a few ideas of improvements: I think FT should be immutable? Personally, I don't think we should enforce "patterns" like Freezable or Builder at this low-level, instead I think FieldType should be a simple immutable class with a single ctor that takes the minimal stuff that we (core lucene) need. It can still be concrete, but then you have to specify everything. Then, things like TextField/StringField are sugar APIs for common configurations. I don't like the idea of mutable FieldTypes that are reused across different fields because I am concerned that somehow the 'wrong configuration' will be applied accidentally. Along these lines, we can then remove the "copy constructor", which also seems unnatural to java users, since FieldType would then be immutable there is no reason to ever copy it. I think BinaryField should be able to index as binary? This is a new feature in Lucene 4 but its unfortunately really hard to do: there are a few approaches, but they are all difficult: custom tokenstream/AttributeImpls/AttributeFactory etc. In the future, a BinaryField like this could be a base impl for CollationField: due to historical reasons we expose this capability as an Analyzer but I think this isn't great: its really an implementation detail. For example in Solr, its a real FieldType that uses an analyzer behind the scenes. So in this sense I think its more consistent with NumericField.
          Hide
          Michael McCandless added a comment -

          I think FT should be immutable?

          I don't like the idea of mutable FieldTypes that are reused across different fields because I am concerned that somehow the 'wrong configuration' will be applied accidentally.

          This is why we have FT.freeze, and why an FT is frozen as soon as it's
          used in a Field. But I agree it'd be even better if we had true
          immutability (all fields in FT are final).

          I think FieldType should be a simple immutable class with a single ctor that takes the minimal stuff that we (core lucene) need.

          It can still be concrete, but then you have to specify everything. Then, things like TextField/StringField are sugar APIs for common configurations.

          This is a neat idea!

          Another plus is this is a single place where we can check consistency
          of the settings (eg you cannot enable term vectors if indexed is
          false).

          So this would mean we'd have alternate ctors to the sugar classes for
          the common cases, like maybe:

             new StringField(name, value)
             new StoredStringField(name, value)
          

          StringField would always omitNorms, not tokenize, index DOCS_ONLY.

          For TextField maybe:

             new TextField(name, value)
             new TextField(name, value, omitNorms)
             new TextField(name, value, omitNorms, indexTVPos, indexTVOffsets)
             new StoredTextField(name, value)
             new StoredTextField(name, value, omitNorms)
             new StoredTextField(name, value, omitNorms, indexTVPos, indexTVOffsets)
          

          Expert usage would always have the out of invoking FT directly with
          all options. Even more expert usage can bypass the "userspace"
          FieldType/Field/Document entirely and code directly to IndexableField
          instead.

          I think BinaryField should be able to index as binary?

          I agree! Not sure on the details of how we'd do that though... Today
          this field is only stored byte[].

          Show
          Michael McCandless added a comment - I think FT should be immutable? I don't like the idea of mutable FieldTypes that are reused across different fields because I am concerned that somehow the 'wrong configuration' will be applied accidentally. This is why we have FT.freeze, and why an FT is frozen as soon as it's used in a Field. But I agree it'd be even better if we had true immutability (all fields in FT are final). I think FieldType should be a simple immutable class with a single ctor that takes the minimal stuff that we (core lucene) need. It can still be concrete, but then you have to specify everything. Then, things like TextField/StringField are sugar APIs for common configurations. This is a neat idea! Another plus is this is a single place where we can check consistency of the settings (eg you cannot enable term vectors if indexed is false). So this would mean we'd have alternate ctors to the sugar classes for the common cases, like maybe: new StringField(name, value) new StoredStringField(name, value) StringField would always omitNorms, not tokenize, index DOCS_ONLY. For TextField maybe: new TextField(name, value) new TextField(name, value, omitNorms) new TextField(name, value, omitNorms, indexTVPos, indexTVOffsets) new StoredTextField(name, value) new StoredTextField(name, value, omitNorms) new StoredTextField(name, value, omitNorms, indexTVPos, indexTVOffsets) Expert usage would always have the out of invoking FT directly with all options. Even more expert usage can bypass the "userspace" FieldType/Field/Document entirely and code directly to IndexableField instead. I think BinaryField should be able to index as binary? I agree! Not sure on the details of how we'd do that though... Today this field is only stored byte[].
          Hide
          Chris Male added a comment -

          Much of what has been suggested here I'm looking at incorporating in LUCENE-3312 but perhaps its best to do it in small steps.

          What I want to do is as follows:

          • Change FieldType to an interface inside index.* and use it for the source of properties about an IndexableField. It will be simple and immutable and won't enforce any creation techniques.
          • Add a builder for FieldType to document.* which will create FieldType instances.
          • Add the syntactic sugar ctors suggested above which would use the builder to instantiate the FieldTypes they need.
          Show
          Chris Male added a comment - Much of what has been suggested here I'm looking at incorporating in LUCENE-3312 but perhaps its best to do it in small steps. What I want to do is as follows: Change FieldType to an interface inside index.* and use it for the source of properties about an IndexableField. It will be simple and immutable and won't enforce any creation techniques. Add a builder for FieldType to document.* which will create FieldType instances. Add the syntactic sugar ctors suggested above which would use the builder to instantiate the FieldTypes they need.
          Hide
          Robert Muir added a comment -

          Add a builder for FieldType to document.* which will create FieldType instances.

          Can we avoid the builder API? I think we shouldnt invite accidental creation of lots of FieldType instances during indexing... why not just a single ctor in fieldtype that takes all the parameters the base class cares about? then it serves double-duty as the 'expert' fieldtype anyway, subclasses like TextField are just the sugar.

          If someone wants to implement their subclass with a builder, they could still do this, but I don't think we should force the builder API on people with such a user-facing API: I think we should just keep it a very simple immutable class with one ctor.

          Given the choice between builder and freezable though, I'll take freezable any day... but as I said before I think we should keep this even simpler.

          Show
          Robert Muir added a comment - Add a builder for FieldType to document.* which will create FieldType instances. Can we avoid the builder API? I think we shouldnt invite accidental creation of lots of FieldType instances during indexing... why not just a single ctor in fieldtype that takes all the parameters the base class cares about? then it serves double-duty as the 'expert' fieldtype anyway, subclasses like TextField are just the sugar. If someone wants to implement their subclass with a builder, they could still do this, but I don't think we should force the builder API on people with such a user-facing API: I think we should just keep it a very simple immutable class with one ctor. Given the choice between builder and freezable though, I'll take freezable any day... but as I said before I think we should keep this even simpler.
          Hide
          Chris Male added a comment -

          I'm definitely -1 for a constructor for all the properties. We may only have a few properties today but it's not going to stay that way. One of the benefits I see of FieldType is that we can extend it to have a greater range of properties that allow more customized handling of fields. If we put everything into a constructor then it'll grow out of control, prevent us from adding more properties and we'll end up having another project just to break it up more.

          With an interface, we're not forcing anything on anyone. Users can create FieldTypes however they like.

          Show
          Chris Male added a comment - I'm definitely -1 for a constructor for all the properties. We may only have a few properties today but it's not going to stay that way. One of the benefits I see of FieldType is that we can extend it to have a greater range of properties that allow more customized handling of fields. If we put everything into a constructor then it'll grow out of control, prevent us from adding more properties and we'll end up having another project just to break it up more. With an interface, we're not forcing anything on anyone. Users can create FieldTypes however they like.
          Hide
          Robert Muir added a comment -

          I'm definitely -1 for a constructor for all the properties. We may only have a few properties today but it's not going to stay that way

          What is the problem? I am talking about the core "FieldType" that is the main stuff we need: e.g. indexed/stored/etc

          Subclasses can do whatever they want (builders/freezable/i dont care), but we should make a simple extendable immutable core class for the limited
          set of properties that really need to be in the fieldtype subclass.

          Show
          Robert Muir added a comment - I'm definitely -1 for a constructor for all the properties. We may only have a few properties today but it's not going to stay that way What is the problem? I am talking about the core "FieldType" that is the main stuff we need: e.g. indexed/stored/etc Subclasses can do whatever they want (builders/freezable/i dont care), but we should make a simple extendable immutable core class for the limited set of properties that really need to be in the fieldtype subclass.
          Hide
          Chris Male added a comment -

          The problem is that it locks us in to this limited set of 'core' properties. They will grow, I can assure you.

          Show
          Chris Male added a comment - The problem is that it locks us in to this limited set of 'core' properties. They will grow, I can assure you.
          Hide
          Robert Muir added a comment -

          it doesn't lock us into anything. we deprecate the old ctor and make a new one with the correct default.

          This is easy!

          Besides, you arent assuring me? what new things really need to be added that apply to all field types across the board?!

          Show
          Robert Muir added a comment - it doesn't lock us into anything. we deprecate the old ctor and make a new one with the correct default. This is easy! Besides, you arent assuring me? what new things really need to be added that apply to all field types across the board?!
          Hide
          Chris Male added a comment -

          Alright, I'll wait to see your patch using the ctor approach.

          Show
          Chris Male added a comment - Alright, I'll wait to see your patch using the ctor approach.
          Hide
          Robert Muir added a comment -

          I'm not really motivated to work on a patch with a solution you already said you are -1 on?

          Again, I'm totally against any builder here, I'm ok with freezable that we have now, but i think a plain immutable java object would be a lot simpler and clear... at the end of the day I'm ok with compromising with what we have already, I just brought this immutability idea up as a suggestion.

          Show
          Robert Muir added a comment - I'm not really motivated to work on a patch with a solution you already said you are -1 on? Again, I'm totally against any builder here, I'm ok with freezable that we have now, but i think a plain immutable java object would be a lot simpler and clear... at the end of the day I'm ok with compromising with what we have already, I just brought this immutability idea up as a suggestion.
          Hide
          Chris Male added a comment -

          Then we're at a bit of an impasse because you're totally against what I suggested as well. I'm suggesting you cut a patch and we can go over it, I might see how it can be simpler and clearer.

          Show
          Chris Male added a comment - Then we're at a bit of an impasse because you're totally against what I suggested as well. I'm suggesting you cut a patch and we can go over it, I might see how it can be simpler and clearer.
          Hide
          Chris Male added a comment -

          What about a compromise and have both the plain Java object and a builder? That way if someone wants the simple clear way you describe, they have the ctor they can go to. If they want what I feel would be more readable code, they can use the builder.

          Show
          Chris Male added a comment - What about a compromise and have both the plain Java object and a builder? That way if someone wants the simple clear way you describe, they have the ctor they can go to. If they want what I feel would be more readable code, they can use the builder.
          Hide
          Simon Willnauer added a comment -

          Hey guys, why don't we put plain old immutable java objects with a single ctor into core and add a builder API into modules / sandbox? This keeps things simple in core and if users want to use it they can grab it out of a module?

          Can we avoid the builder API? I think we shouldnt invite accidental creation of lots of FieldType instances during indexing... why not just a single ctor in fieldtype that takes all the parameters the base class cares about? then it serves double-duty as the 'expert' fieldtype anyway, subclasses like TextField are just the sugar.

          so I haven't seen a single technical argument against a builder here. I personally think that a builder has many advantages:

          • simple to add new fields, doesn't need deprecation if you add another field to a type
          • simple to use since lots of people are use to chaining
          • provides immutability by design
          • represents a small but clear DSL to build a field type. you could do things like providing setters for TV only if you chain it with a call to indexed() like:
             builder.indexed().storeTV(); 

            which would not be visible otherwise.

          • a ctor call will require many parameters that you don't want to set, but you're forced to pass a value for them anyway
          • since most of the parameters are booleans long sequences of identically typed parameters can cause subtle bugs. If the user accidentally reverses two such parameters, the compiler won't complain, but the program will misbehave at runtime. That sucks! especially if you spend hours of indexing and realize that your TV has not been stored because you missed to set indexed = true
          • builder code is easy to write and, more importantly, to read.
          • a builder simulates named optional parameters like in python and other languages which java is lacking.

          I think the Builder pattern is a good choice when designing classes whose constructors would have more than a handful of parameters, especially if most of those parameters are optional. Client code is much easier to read and write with builders than with the traditional telescoping constructor pattern.

          Show
          Simon Willnauer added a comment - Hey guys, why don't we put plain old immutable java objects with a single ctor into core and add a builder API into modules / sandbox? This keeps things simple in core and if users want to use it they can grab it out of a module? Can we avoid the builder API? I think we shouldnt invite accidental creation of lots of FieldType instances during indexing... why not just a single ctor in fieldtype that takes all the parameters the base class cares about? then it serves double-duty as the 'expert' fieldtype anyway, subclasses like TextField are just the sugar. so I haven't seen a single technical argument against a builder here. I personally think that a builder has many advantages: simple to add new fields, doesn't need deprecation if you add another field to a type simple to use since lots of people are use to chaining provides immutability by design represents a small but clear DSL to build a field type. you could do things like providing setters for TV only if you chain it with a call to indexed() like: builder.indexed().storeTV(); which would not be visible otherwise. a ctor call will require many parameters that you don't want to set, but you're forced to pass a value for them anyway since most of the parameters are booleans long sequences of identically typed parameters can cause subtle bugs. If the user accidentally reverses two such parameters, the compiler won't complain, but the program will misbehave at runtime. That sucks! especially if you spend hours of indexing and realize that your TV has not been stored because you missed to set indexed = true builder code is easy to write and, more importantly, to read. a builder simulates named optional parameters like in python and other languages which java is lacking. I think the Builder pattern is a good choice when designing classes whose constructors would have more than a handful of parameters, especially if most of those parameters are optional. Client code is much easier to read and write with builders than with the traditional telescoping constructor pattern.
          Hide
          Chris Male added a comment -

          +1

          I couldn't have put it better myself.

          Show
          Chris Male added a comment - +1 I couldn't have put it better myself.
          Hide
          Uwe Schindler added a comment -

          +1

          I agree, too! I am personally in favour of builder patterns when parameters get beyond 3 or 4, especially if they are simply booleans.

          Show
          Uwe Schindler added a comment - +1 I agree, too! I am personally in favour of builder patterns when parameters get beyond 3 or 4, especially if they are simply booleans.
          Hide
          Uwe Schindler added a comment -

          Somehow related, but for the same reasons (too many booleans in ctor), WordDelimiterFilter would also be a candidate for a WordDelimiterFilterBuilder.

          Show
          Uwe Schindler added a comment - Somehow related, but for the same reasons (too many booleans in ctor), WordDelimiterFilter would also be a candidate for a WordDelimiterFilterBuilder.
          Hide
          Robert Muir added a comment -

          so I haven't seen a single technical argument against a builder here. I personally think that a builder has many advantages:

          I gave one already, it creates too many objects. It also adds complexity to the API.

          Just because a constructor has a couple parameters does NOT mean a builder fits. In situations like this one, its a bad choice.

          Show
          Robert Muir added a comment - so I haven't seen a single technical argument against a builder here. I personally think that a builder has many advantages: I gave one already, it creates too many objects. It also adds complexity to the API. Just because a constructor has a couple parameters does NOT mean a builder fits. In situations like this one, its a bad choice.
          Hide
          Chris Male added a comment -

          How does it create too many objects?

          Show
          Chris Male added a comment - How does it create too many objects?
          Hide
          Uwe Schindler added a comment -

          How does it create too many objects?

          Thats implementation internal. If you want final unmodifiable objects, every builder call will produce a new one in its return parameter (see ScorerContext).

          In general the builder pattern can also change existing objects, like StringBuilder does.

          Show
          Uwe Schindler added a comment - How does it create too many objects? Thats implementation internal. If you want final unmodifiable objects, every builder call will produce a new one in its return parameter (see ScorerContext). In general the builder pattern can also change existing objects, like StringBuilder does.
          Hide
          Robert Muir added a comment -

          we have to realize, most people indexing with lucene do it like this:

          while(...) {
            Document doc = new Document(...);
            Field field1 = new Field(...);
            Field field2 = new Field(...);
          }
          

          So for MOST people FT is increasing the number of objects being created per-document (most people will create a new one for every field). I think we should keep that at a minimum.

          Adding a builder on top, will at minimum require an additional object for the builder itself AND:

          • creation of a new intermediate throw-away FieldType with each .set() OR
          • creation of an additional mutable object used internally by the builder which will require keeping in sync with the immutable form.
          Show
          Robert Muir added a comment - we have to realize, most people indexing with lucene do it like this: while(...) { Document doc = new Document(...); Field field1 = new Field(...); Field field2 = new Field(...); } So for MOST people FT is increasing the number of objects being created per-document (most people will create a new one for every field). I think we should keep that at a minimum. Adding a builder on top, will at minimum require an additional object for the builder itself AND : creation of a new intermediate throw-away FieldType with each .set() OR creation of an additional mutable object used internally by the builder which will require keeping in sync with the immutable form.
          Hide
          Robert Muir added a comment - - edited

          In general the builder pattern can also change existing objects, like StringBuilder does

          And thats another bug in the builder anti-pattern, if you want to have a resulting immutable form, thats going to require either an object-creation orgy or a massive code duplication so that it can store an internal mutable form.

          Edit: got my antipatterns confused: visitor -> builder

          Show
          Robert Muir added a comment - - edited In general the builder pattern can also change existing objects, like StringBuilder does And thats another bug in the builder anti-pattern, if you want to have a resulting immutable form, thats going to require either an object-creation orgy or a massive code duplication so that it can store an internal mutable form. Edit: got my antipatterns confused: visitor -> builder
          Hide
          Simon Willnauer added a comment -

          just to get this straight, FT is created only once and then shared this is its point. So the builder would be used only to set stuff up. Even if you create this each time all the object created are super short living plus modern VM allocate this on the stack I doubt that this is an issue anywhere.

          And thats another bug in the builder anti-pattern, if you want to have a resulting immutable form, thats going to require either an object-creation orgy or a massive code duplication so that it can store an internal mutable form.

          that is simply not true, the builder doesn't create the actual FT until you call BUilder.build() until there you are mutable which is the point of the builder. So where are those objects are created again?

          Show
          Simon Willnauer added a comment - just to get this straight, FT is created only once and then shared this is its point. So the builder would be used only to set stuff up. Even if you create this each time all the object created are super short living plus modern VM allocate this on the stack I doubt that this is an issue anywhere. And thats another bug in the builder anti-pattern, if you want to have a resulting immutable form, thats going to require either an object-creation orgy or a massive code duplication so that it can store an internal mutable form. that is simply not true, the builder doesn't create the actual FT until you call BUilder.build() until there you are mutable which is the point of the builder. So where are those objects are created again?
          Hide
          Robert Muir added a comment -

          just to get this straight, FT is created only once and then shared this is its point.

          Second time this morning you didn't even read what I said.

          scroll up, as i said before most people indexing with lucene don't reuse document/field objects.

          Show
          Robert Muir added a comment - just to get this straight, FT is created only once and then shared this is its point. Second time this morning you didn't even read what I said. scroll up, as i said before most people indexing with lucene don't reuse document/field objects.
          Hide
          Uwe Schindler added a comment - - edited

          Thats not a bug if you implement it correctly, in general a Builder/Factory combination is the solution. The Builder itsself is mutable. The last call on the builder is something like .createField() that no longer returns a Builder instance, but instead the immutable field. Thats all. All calls before use the same builder instance and always returning itsself. See e.g. Google Collections, they have their builders implemented exactly that way.

          Field field1 = new FieldBuilder(name).setFoo().setBar().createField(value);
          

          And for indexing you can even reuse the builder to produce many identical fields:

          FieldBuilder builder = new FieldBuilder(name).setFoo().setBar();
          for (documents) {
            Field field1 = builder.createField(value);
          }
          
          Show
          Uwe Schindler added a comment - - edited Thats not a bug if you implement it correctly, in general a Builder/Factory combination is the solution. The Builder itsself is mutable. The last call on the builder is something like .createField() that no longer returns a Builder instance, but instead the immutable field. Thats all. All calls before use the same builder instance and always returning itsself. See e.g. Google Collections, they have their builders implemented exactly that way. Field field1 = new FieldBuilder(name).setFoo().setBar().createField(value); And for indexing you can even reuse the builder to produce many identical fields: FieldBuilder builder = new FieldBuilder(name).setFoo().setBar(); for (documents) { Field field1 = builder.createField(value); }
          Hide
          Robert Muir added a comment -

          The Builder itsself is mutable.

          Which, as i said before, is a massive code duplication/hassle for maintenance because then when want to add an additional property to fieldtype, we have to add it to the builder, too.

          Again, look at the loop i posted above, which is the most common indexing loop. I just dont think we should introduce any more objects!

          IF we want to enforce that you must reuse field objects (and have a completely different solution for multivalued fields), then I might drop my complaints about builder, because then in that case it would avoid the pitfalls.

          Show
          Robert Muir added a comment - The Builder itsself is mutable. Which, as i said before, is a massive code duplication/hassle for maintenance because then when want to add an additional property to fieldtype, we have to add it to the builder, too. Again, look at the loop i posted above, which is the most common indexing loop. I just dont think we should introduce any more objects! IF we want to enforce that you must reuse field objects (and have a completely different solution for multivalued fields), then I might drop my complaints about builder, because then in that case it would avoid the pitfalls.
          Hide
          Simon Willnauer added a comment -

          Second time this morning you didn't even read what I said.

          I did but apparently we talk about different things? the entire purpose of FT is that you don't have to create it multiple times so folks can create Field each time but they should reuse FT, no? I personally talk about createing FT using a builder but what uwe says is we can also do that for field though.

          Again how do you create way more object when you use a builder than when you use the ctor?

          Show
          Simon Willnauer added a comment - Second time this morning you didn't even read what I said. I did but apparently we talk about different things? the entire purpose of FT is that you don't have to create it multiple times so folks can create Field each time but they should reuse FT, no? I personally talk about createing FT using a builder but what uwe says is we can also do that for field though. Again how do you create way more object when you use a builder than when you use the ctor?
          Hide
          Chris Male added a comment -

          I'm confused about what the reuse of Field objects has to do with this? That seems a corollary issue. Aren't we talking about reducing the cost of creating FieldType instances? Which as Simon said, are then shared.

          Show
          Chris Male added a comment - I'm confused about what the reuse of Field objects has to do with this? That seems a corollary issue. Aren't we talking about reducing the cost of creating FieldType instances? Which as Simon said, are then shared.
          Hide
          Robert Muir added a comment -

          Its shared only if the person reuses them explicitly, but if they arent reusing fields (like most people don't do), then they arent likely to reuse fieldtypes either.

          In general, I think we shouldnt create so many objects or add so much complexity to the indexing loop.

          Personally I just dont think in practice people are going to set things up so that they actually reuse fieldtype.

          Show
          Robert Muir added a comment - Its shared only if the person reuses them explicitly, but if they arent reusing fields (like most people don't do), then they arent likely to reuse fieldtypes either. In general, I think we shouldnt create so many objects or add so much complexity to the indexing loop. Personally I just dont think in practice people are going to set things up so that they actually reuse fieldtype.
          Hide
          Michael McCandless added a comment -

          Change FieldType to an interface inside index.* and use it for the source of properties about an IndexableField.

          +1, I think we should have an oal.index.FieldType interface, that
          exposes (get-only) methods. Ie, we'd just move the getters out of
          IndexableField into this new FT interface (likewise for
          StorableField).

          This interface should be marked as experimental, ie, we are free to
          change it.

          Add a builder for FieldType to document.* which will create FieldType instances.

          I don't think we should use a builder API here; I think either
          big-ctor-takes-all-settings and so all fields are final, or what we
          have today (.freeze()) is better.

          There are two things I don't like about the builder pattern: setter
          chaining and the object overhead of hard immutability.

          On setter chaining:

          • It's two ways to do the same thing (chaining or not); generally an
            API (and a PL) should offer one (obvious) way to do things.
            Suddenly we'll see tutorials and articles etc. online, some with
            chaining, some without, and some mixed.
          • Code is less readable w/ chaining: it makes it easy to sneak in
            multiple statements per line, embed them into other statements,
            etc., vs unchained where you always have one statement per line
          • I don't like .indexed() as a name; I prefer .setIndexed() so it's
            clear you setting something about the object.
          • In encourages inefficient code, because it's easy to inline new
            X().this().that() when in fact the app really should create &
            reuse FieldType up front. This is trappy – the app doesn't
            realize they're creating N+1 objects.

          I also don't like the hard immutability (every field is final so every
          setter returns a new object) since this will mean the typical use is
          creating tons of objects per field per doc. Yes we can have a mutable
          builder with a .build() in the end but that's making the API even more
          cumbersome.

          In contrast, the "soft" immutability we have now (freeze) is very
          effective, and creates no additional objects: it will prevent you from
          altering a FT instance once any Field uses it. Really the
          immutability is a minor detail of the implementation here; we only
          need it to prevent this trap.

          Generally we should try to keep Lucene's core APIs as
          plain/simple/straightforward as possible. Someone can always later
          layer on a builder API on top of the simpler setter+freeze or
          all-properties-to-ctor API, but, not vice/versa (efficiently anyway).

          Show
          Michael McCandless added a comment - Change FieldType to an interface inside index.* and use it for the source of properties about an IndexableField. +1, I think we should have an oal.index.FieldType interface, that exposes (get-only) methods. Ie, we'd just move the getters out of IndexableField into this new FT interface (likewise for StorableField). This interface should be marked as experimental, ie, we are free to change it. Add a builder for FieldType to document.* which will create FieldType instances. I don't think we should use a builder API here; I think either big-ctor-takes-all-settings and so all fields are final, or what we have today (.freeze()) is better. There are two things I don't like about the builder pattern: setter chaining and the object overhead of hard immutability. On setter chaining: It's two ways to do the same thing (chaining or not); generally an API (and a PL) should offer one (obvious) way to do things. Suddenly we'll see tutorials and articles etc. online, some with chaining, some without, and some mixed. Code is less readable w/ chaining: it makes it easy to sneak in multiple statements per line, embed them into other statements, etc., vs unchained where you always have one statement per line I don't like .indexed() as a name; I prefer .setIndexed() so it's clear you setting something about the object. In encourages inefficient code, because it's easy to inline new X().this().that() when in fact the app really should create & reuse FieldType up front. This is trappy – the app doesn't realize they're creating N+1 objects. I also don't like the hard immutability (every field is final so every setter returns a new object) since this will mean the typical use is creating tons of objects per field per doc. Yes we can have a mutable builder with a .build() in the end but that's making the API even more cumbersome. In contrast, the "soft" immutability we have now (freeze) is very effective, and creates no additional objects: it will prevent you from altering a FT instance once any Field uses it. Really the immutability is a minor detail of the implementation here; we only need it to prevent this trap. Generally we should try to keep Lucene's core APIs as plain/simple/straightforward as possible. Someone can always later layer on a builder API on top of the simpler setter+freeze or all-properties-to-ctor API, but, not vice/versa (efficiently anyway).
          Hide
          Chris Male added a comment - - edited

          Didn't Simon suggest we add the big-ctor version to core?

          why don't we put plain old immutable java objects with a single ctor into core and add a builder API into modules / sandbox?

          So yes, Lucene's core can stay lean and mean, but we can have the builder in userland / module / sandbox

          Show
          Chris Male added a comment - - edited Didn't Simon suggest we add the big-ctor version to core? why don't we put plain old immutable java objects with a single ctor into core and add a builder API into modules / sandbox? So yes, Lucene's core can stay lean and mean, but we can have the builder in userland / module / sandbox
          Hide
          Uwe Schindler added a comment -

          I am on opposite side:

          In general the constructor of the immutable class is hidden (package-private or private depending on class hierarchy). So nobody can use it. The only API the user sees is the builder pattern. By that we only have one API and one usage type.

          Builder patterns can be formatted very nice and it does not matter if people do:

          Field.Builder b = new Field.Builder();
          b.setFoo();
          b.setBar();
          Field f = b.build();
          

          versus

          Field f = new Field.Builder()
           .setFoo()
           .setBar()
           .build();
          

          The last chaining one is even more readable, and that is why I prefer builders. A so called "telescoping constructor" is the antipattern because its completely unreadable, as Java lacks of named parameters [the best example is WordDelimiterFilter, that one is horrible - a typical candidate for WordDelimiterFilter.Builder subclass). The chaining code is for stack based machines like the JVM and the x86 processors also more natural than the first one. The return value of the previous call resides already on the stack after the method returns, but instead of popping it and pushing again, it can stay there and you simply add the parameters of the next method call. This leads to also very elegant bytecode, for which hotspot has optimizations

          About code duplication: You can in the hidden ctor of the immutable class make a clone of the builder and keep it somewhere private final inside the instance. This one then holds the unmodifiable instance state.

          About number of objects (yes, we have the builder object and possibly a clone to it as suggested before and finally the immutable object): The number of objects is really nonsense here as all of those will be created in the Eden space and disappear as soon as the loop/method exits. You can try autoboxing with a recent JavaVM - you would in most cases see no slowdown caused by autoboxing. These are problems from pre-2000 when we had Java 1.1.

          Uwe

          Show
          Uwe Schindler added a comment - I am on opposite side: In general the constructor of the immutable class is hidden (package-private or private depending on class hierarchy). So nobody can use it. The only API the user sees is the builder pattern. By that we only have one API and one usage type. Builder patterns can be formatted very nice and it does not matter if people do: Field.Builder b = new Field.Builder(); b.setFoo(); b.setBar(); Field f = b.build(); versus Field f = new Field.Builder() .setFoo() .setBar() .build(); The last chaining one is even more readable, and that is why I prefer builders. A so called "telescoping constructor" is the antipattern because its completely unreadable, as Java lacks of named parameters [the best example is WordDelimiterFilter, that one is horrible - a typical candidate for WordDelimiterFilter.Builder subclass). The chaining code is for stack based machines like the JVM and the x86 processors also more natural than the first one. The return value of the previous call resides already on the stack after the method returns, but instead of popping it and pushing again, it can stay there and you simply add the parameters of the next method call. This leads to also very elegant bytecode, for which hotspot has optimizations About code duplication: You can in the hidden ctor of the immutable class make a clone of the builder and keep it somewhere private final inside the instance. This one then holds the unmodifiable instance state. About number of objects (yes, we have the builder object and possibly a clone to it as suggested before and finally the immutable object): The number of objects is really nonsense here as all of those will be created in the Eden space and disappear as soon as the loop/method exits. You can try autoboxing with a recent JavaVM - you would in most cases see no slowdown caused by autoboxing. These are problems from pre-2000 when we had Java 1.1. Uwe
          Hide
          Simon Willnauer added a comment -

          awesome writeup uwe! thank you!

          Show
          Simon Willnauer added a comment - awesome writeup uwe! thank you!
          Hide
          Robert Muir added a comment -

          In general the constructor of the immutable class is hidden (package-private or private depending on class hierarchy). So nobody can use it. The only API the user sees is the builder pattern.

          I am strongly against this: there is no reason to do this. We should instead expose the constructor of the immutable class so that people who want builders can use them, but i don't want builders, i shouldnt have to. there is no reason for this.

          Show
          Robert Muir added a comment - In general the constructor of the immutable class is hidden (package-private or private depending on class hierarchy). So nobody can use it. The only API the user sees is the builder pattern. I am strongly against this: there is no reason to do this. We should instead expose the constructor of the immutable class so that people who want builders can use them, but i don't want builders, i shouldnt have to. there is no reason for this.
          Hide
          Uwe Schindler added a comment - - edited

          If we release the code with the builder pattern then there is only one possibility and one example code in the class description. If somebody does not like the builder pattern, who cares? If there is nothing else, you have to use it. PERIOD.

          About code duplication: You can in the hidden ctor of the immutable class make a clone of the builder and keep it somewhere private final inside the instance. This one then holds the unmodifiable instance state.

          I already explained: The code duplication comes from the two ways to do it.

          Of course for lovers of telescopic unreadbale methods we can still add some conventional factory methods, taking tons of parameters, but internally use the builder. The user would not see the builder.

          Show
          Uwe Schindler added a comment - - edited If we release the code with the builder pattern then there is only one possibility and one example code in the class description. If somebody does not like the builder pattern, who cares? If there is nothing else, you have to use it. PERIOD. About code duplication: You can in the hidden ctor of the immutable class make a clone of the builder and keep it somewhere private final inside the instance. This one then holds the unmodifiable instance state. I already explained: The code duplication comes from the two ways to do it. Of course for lovers of telescopic unreadbale methods we can still add some conventional factory methods, taking tons of parameters, but internally use the builder. The user would not see the builder.
          Hide
          Robert Muir added a comment -

          I care, thats why i am -1 to the builder pattern. The pro-builders on this issue just silently argue that my concerns don't matter.

          Mike gave his opinion on it too.

          Stating that our concerns are meaningless is not the way to create consensus towards a good solution here.

          Show
          Robert Muir added a comment - I care, thats why i am -1 to the builder pattern. The pro-builders on this issue just silently argue that my concerns don't matter. Mike gave his opinion on it too. Stating that our concerns are meaningless is not the way to create consensus towards a good solution here.
          Hide
          Chris Male added a comment -

          The pro-builders on this issue just silently argue that my concerns don't matter.

          I resent that. I've actively tried to understand your concerns and reach a compromise and consensus.

          Show
          Chris Male added a comment - The pro-builders on this issue just silently argue that my concerns don't matter. I resent that. I've actively tried to understand your concerns and reach a compromise and consensus.
          Hide
          Robert Muir added a comment -

          So yes, Lucene's core can stay lean and mean, but we can have the builder in userland / module / sandbox

          Chris, personally I think this is a reasonable solution, but my arguments are instead against the other ridiculous statements on the issue implying that my concerns do not matter.

          The original idea for using a simple java object was just this, so that people can do whatever they want (builders, whatever). But there is no reason to enforce any specific anti-pattern here, when we can just leave that to the application.

          Show
          Robert Muir added a comment - So yes, Lucene's core can stay lean and mean, but we can have the builder in userland / module / sandbox Chris, personally I think this is a reasonable solution, but my arguments are instead against the other ridiculous statements on the issue implying that my concerns do not matter. The original idea for using a simple java object was just this, so that people can do whatever they want (builders, whatever). But there is no reason to enforce any specific anti-pattern here, when we can just leave that to the application.
          Hide
          Yonik Seeley added a comment -

          I think we should have an oal.index.FieldType interface, that exposes (get-only) methods.

          +1

          I also don't see a lot of value in jumping through too many hoops trying to enforce immutability (vs just making it easy for people to avoid common mistakes).

          Show
          Yonik Seeley added a comment - I think we should have an oal.index.FieldType interface, that exposes (get-only) methods. +1 I also don't see a lot of value in jumping through too many hoops trying to enforce immutability (vs just making it easy for people to avoid common mistakes).
          Hide
          Chris Male added a comment -

          Alright, so can we move towards a consensus on a solution?

          So far I see people are okay with:

          • Moving FieldType over to an interface which exposes get only methods
          • Creating the core implementation which uses a ctor with final fields
          • Builder API can be created and placed in a yet to be determined place.

          Sweet?

          Show
          Chris Male added a comment - Alright, so can we move towards a consensus on a solution? So far I see people are okay with: Moving FieldType over to an interface which exposes get only methods Creating the core implementation which uses a ctor with final fields Builder API can be created and placed in a yet to be determined place. Sweet?
          Hide
          Chris Male added a comment -

          Err Yonik pointed out that we still have the option of continuing to use the freezable 'soft' immutability. I didn't mean to ignore it.

          Show
          Chris Male added a comment - Err Yonik pointed out that we still have the option of continuing to use the freezable 'soft' immutability. I didn't mean to ignore it.
          Hide
          Uwe Schindler added a comment -

          I disagree, because that would again create different usage patterns and more questions on the user.

          If we only have one way to do it (I favour the builder pattern) with a code example (like NumericRangeQuery does in its javadocs) this is all obvious to users.

          I think telescopic ctors/methods are an antipattern because of readability and I think also Robert will agree with me that e.g. WordDelimiterFilter is unuseable.

          Show
          Uwe Schindler added a comment - I disagree, because that would again create different usage patterns and more questions on the user. If we only have one way to do it (I favour the builder pattern) with a code example (like NumericRangeQuery does in its javadocs) this is all obvious to users. I think telescopic ctors/methods are an antipattern because of readability and I think also Robert will agree with me that e.g. WordDelimiterFilter is unuseable.
          Hide
          Robert Muir added a comment -

          But if fieldtype is an interface with get-only methods, then someone could make a Freezable implementation right?
          Maybe the interface is good, because I dislike 'forcing' freezable too, just not as much as a dislike builder.

          so, i think the interface sounds good, and would still personally prefer if our 'default' core implementation did not use freezable, and used the simpler ctor instead.

          also I think we should be gearing the API so that most people can use the simpler fieldtypes (StringField/TextField etc) for 90% of lucene uses instead: I think we want using FieldType directly to be more expert usage (e.g. i should be able to do a typical body+title+metadata fields with these StringField/TextField etc and never deal with this stuff).

          Show
          Robert Muir added a comment - But if fieldtype is an interface with get-only methods, then someone could make a Freezable implementation right? Maybe the interface is good, because I dislike 'forcing' freezable too, just not as much as a dislike builder. so, i think the interface sounds good, and would still personally prefer if our 'default' core implementation did not use freezable, and used the simpler ctor instead. also I think we should be gearing the API so that most people can use the simpler fieldtypes (StringField/TextField etc) for 90% of lucene uses instead: I think we want using FieldType directly to be more expert usage (e.g. i should be able to do a typical body+title+metadata fields with these StringField/TextField etc and never deal with this stuff).
          Hide
          Chris Male added a comment -

          I don't see anything wrong with providing options.

          Show
          Chris Male added a comment - I don't see anything wrong with providing options.
          Hide
          Chris Male added a comment -

          Okay so we seem to have consensus on moving to a get-only interface.

          The question just remains how to implement the 'default' core implementation.

          Show
          Chris Male added a comment - Okay so we seem to have consensus on moving to a get-only interface. The question just remains how to implement the 'default' core implementation.
          Hide
          Uwe Schindler added a comment -

          Freezeable is an antipattern and produces messy code on the implementation side, just because someone still stays in the 1990s when Java was not able to handle lots of small short-living objects. That's since almost the beginning of this century no issue anymore.

          Show
          Uwe Schindler added a comment - Freezeable is an antipattern and produces messy code on the implementation side, just because someone still stays in the 1990s when Java was not able to handle lots of small short-living objects. That's since almost the beginning of this century no issue anymore.
          Hide
          Uwe Schindler added a comment -

          In my opinion, we should vote for the following solutions:

          [1] Old-style telescopic ctors on a immutable FieldType
          [2] FieldType.Builder pattern with hidden FieldType Ctor and optionally static FieldType factory methods that produce commonly used types/maybe even telescopic (thise factories use builder internally / have a set of preconfigured builders available). The private ctor tkaes the Builder instance and clones it to keep state final (like IndexWriter)
          [3] Modifiable FieldType with a freeze() method and iffecient code because of stupid checks on every method - this is somehow a builder, the difference is only that the builder and final instance are same class.
          [4] Readonly interface with all three implementations

          Here my +1 for a easy to use Builder-only [2] implementation and nothing else. This has no additional object creation except the builder and an internal clone, but those are shortliving.

          Show
          Uwe Schindler added a comment - In my opinion, we should vote for the following solutions: [1] Old-style telescopic ctors on a immutable FieldType [2] FieldType.Builder pattern with hidden FieldType Ctor and optionally static FieldType factory methods that produce commonly used types/maybe even telescopic (thise factories use builder internally / have a set of preconfigured builders available). The private ctor tkaes the Builder instance and clones it to keep state final (like IndexWriter) [3] Modifiable FieldType with a freeze() method and iffecient code because of stupid checks on every method - this is somehow a builder, the difference is only that the builder and final instance are same class. [4] Readonly interface with all three implementations Here my +1 for a easy to use Builder-only [2] implementation and nothing else. This has no additional object creation except the builder and an internal clone, but those are shortliving.
          Hide
          Robert Muir added a comment -

          Okay so we seem to have consensus on moving to a get-only interface.

          I'm not sure: we should see what Uwe thinks. It seems he might be against the idea that there are multiple ways to do this (I think its a valid concern, i just disagree with him though). I think the ideal situation is where StringField/TextField cover the majority of use cases and doing anything with FT is expert, e.g. intended for apps like Solr to implement the interface and probably not even use our 'default' FieldType implementation. I think the default impl is just for someone that wants something thats not out-of-box, e.g. tokenized TextField that omits TF.

          Show
          Robert Muir added a comment - Okay so we seem to have consensus on moving to a get-only interface. I'm not sure: we should see what Uwe thinks. It seems he might be against the idea that there are multiple ways to do this (I think its a valid concern, i just disagree with him though). I think the ideal situation is where StringField/TextField cover the majority of use cases and doing anything with FT is expert, e.g. intended for apps like Solr to implement the interface and probably not even use our 'default' FieldType implementation. I think the default impl is just for someone that wants something thats not out-of-box, e.g. tokenized TextField that omits TF.
          Hide
          Chris Male added a comment -

          I think the ideal situation is where StringField/TextField cover the majority of use cases and doing anything with FT is expert, e.g. intended for apps like Solr to implement the interface and probably not even use our 'default' FieldType implementation.

          Separate to the implementation issue, I don't think I've fully grasped what you want StringField/TextField to be? Do you seem them as having pre-defined FieldType setups?

          Show
          Chris Male added a comment - I think the ideal situation is where StringField/TextField cover the majority of use cases and doing anything with FT is expert, e.g. intended for apps like Solr to implement the interface and probably not even use our 'default' FieldType implementation. Separate to the implementation issue, I don't think I've fully grasped what you want StringField/TextField to be? Do you seem them as having pre-defined FieldType setups?
          Hide
          Robert Muir added a comment -

          well, looking at them, thats what they are now already? or am i totally confused?

          Show
          Robert Muir added a comment - well, looking at them, thats what they are now already? or am i totally confused?
          Hide
          Chris Male added a comment -

          Ah sorry, when I last looked at them some had constructors which accepted FieldTypes. Now those have been removed so yes, thats what they are now.

          Show
          Chris Male added a comment - Ah sorry, when I last looked at them some had constructors which accepted FieldTypes. Now those have been removed so yes, thats what they are now.
          Hide
          Michael McCandless added a comment -

          Builder patterns can be formatted very nice:

          Field f = new Field.Builder()
           .setFoo()
           .setBar()
           .build();
          

          This is nice in theory but in practice I often see massive compound
          hard-to-read lines like this:

            IndexWriter writer = new IndexWriter(dir, newIndexWriterConfig( TEST_VERSION_CURRENT, new MockAnalyzer(random)).setMaxBufferedDocs(2).setMergePolicy(newLogMergePolicy()));
          

          I don't like that the chained setters make such code possible: it's
          unreadable.

          Show
          Michael McCandless added a comment - Builder patterns can be formatted very nice: Field f = new Field.Builder() .setFoo() .setBar() .build(); This is nice in theory but in practice I often see massive compound hard-to-read lines like this: IndexWriter writer = new IndexWriter(dir, newIndexWriterConfig( TEST_VERSION_CURRENT, new MockAnalyzer(random)).setMaxBufferedDocs(2).setMergePolicy(newLogMergePolicy())); I don't like that the chained setters make such code possible: it's unreadable.
          Hide
          Uwe Schindler added a comment - - edited

          This is still more readable than

          new IndexWriter(dir, TEST_VERSION_CURRENT, new MockAnalyzer(random), 2, newLogMergePolicy());
          

          What does 2 mean? Yes its more verbose, but withy any recent UI, the syntax highlighting makes even a one-line chain easy readable.

          Here some quotes from Jushua Bloch (who is also the founder of Java Collections framework) about "his" pattern: http://www.goodreads.com/author/quotes/60805.Joshua_Bloch

          Show
          Uwe Schindler added a comment - - edited This is still more readable than new IndexWriter(dir, TEST_VERSION_CURRENT, new MockAnalyzer(random), 2, newLogMergePolicy()); What does 2 mean? Yes its more verbose, but withy any recent UI, the syntax highlighting makes even a one-line chain easy readable. Here some quotes from Jushua Bloch (who is also the founder of Java Collections framework) about "his" pattern: http://www.goodreads.com/author/quotes/60805.Joshua_Bloch
          Hide
          Uwe Schindler added a comment -

          I don't like that the chained setters make such code possible: it's unreadable.

          Even as one-line its much better readable than anything else. Did you try to create a WordDelimiterFilter using it 15 argument ctor? Two minutes later you dont know anymore whet the 3rd boolean is about.

          The chained calls can be read left to right and you can do that very fast. The syntax shown above is just extra sugar, but the one line variant is perfectly readable. OK, not for people still using two whitespaces after the end-of-sentence-period (http://en.wikipedia.org/wiki/Sentence_spacing)

          Show
          Uwe Schindler added a comment - I don't like that the chained setters make such code possible: it's unreadable. Even as one-line its much better readable than anything else. Did you try to create a WordDelimiterFilter using it 15 argument ctor? Two minutes later you dont know anymore whet the 3rd boolean is about. The chained calls can be read left to right and you can do that very fast. The syntax shown above is just extra sugar, but the one line variant is perfectly readable. OK, not for people still using two whitespaces after the end-of-sentence-period ( http://en.wikipedia.org/wiki/Sentence_spacing )
          Hide
          Chris Male added a comment -

          So I want to get some actions out of this:

          Uwe, lets open an issue to look at improving WordDelimiterFilter, yeah? I've seen that floating around in tests: new WordDelimiter(1, 1, 0, 1, 1, 0, 0..), yeah its tough to read.

          However whether or not people agree with you on Builders and chain calls, at this stage there just isn't the support to make Builders mandatory. Yes we should create one and I'll look to you for help on that. But as a first step forward lets move FieldType over to being a get-only interface. That will leave the freezable API in there and we can then consider the next step forward. But again, I don't really see consensus on the Builder-only approach. Rather I see a lot of support for having a single ctor implementation and a builder using that.

          Where necessary I think we should strive to document anything around object creation that might catch someone whose blindingly migrating, off guard.

          I'm going to move forward with creating a patch for this first step tomorrow.

          Show
          Chris Male added a comment - So I want to get some actions out of this: Uwe, lets open an issue to look at improving WordDelimiterFilter, yeah? I've seen that floating around in tests: new WordDelimiter(1, 1, 0, 1, 1, 0, 0..), yeah its tough to read. However whether or not people agree with you on Builders and chain calls, at this stage there just isn't the support to make Builders mandatory. Yes we should create one and I'll look to you for help on that. But as a first step forward lets move FieldType over to being a get-only interface. That will leave the freezable API in there and we can then consider the next step forward. But again, I don't really see consensus on the Builder-only approach. Rather I see a lot of support for having a single ctor implementation and a builder using that. Where necessary I think we should strive to document anything around object creation that might catch someone whose blindingly migrating, off guard. I'm going to move forward with creating a patch for this first step tomorrow.
          Hide
          Robert Muir added a comment -

          Uwe, lets open an issue to look at improving WordDelimiterFilter, yeah? I've seen that floating around in tests: new WordDelimiter(1, 1, 0, 1, 1, 0, 0..), yeah its tough to read.

          I agree we should open an issue to improve WDF, these int parameters are actually all boolean flags and we could just pass 'int flags' instead.

          this way you could do new WordDelimiterFilter(GENERATE_WORD_PARTS | CATENATE_ALL | SPLIT_ON_CASE_CHANGE), much more readable.

          Show
          Robert Muir added a comment - Uwe, lets open an issue to look at improving WordDelimiterFilter, yeah? I've seen that floating around in tests: new WordDelimiter(1, 1, 0, 1, 1, 0, 0..), yeah its tough to read. I agree we should open an issue to improve WDF, these int parameters are actually all boolean flags and we could just pass 'int flags' instead. this way you could do new WordDelimiterFilter(GENERATE_WORD_PARTS | CATENATE_ALL | SPLIT_ON_CASE_CHANGE), much more readable.
          Hide
          Michael McCandless added a comment -

          Even as one-line its much better readable than anything else. Did you try to create a WordDelimiterFilter using it 15 argument ctor? Two minutes later you dont know anymore whet the 3rd boolean is about.

          In fact this is what I like about .freeze(): you invoke simple
          setters, one per line (usually), one object.

          The only reason we want immutability here is to prevent the trap of
          changing the FT after binding it to a Field. And freeze accomplishes
          this well.

          I agree massive single ctor isn't great; but maybe w/ EnumSet or int
          flags for the boolean properties it's OK. Or maybe we go back to
          Field.Index.X, Field.Store.Y, etc. Or stick with .freeze.

          Builder API can still be built out (eg in contrib or modules or google
          code or somewhere) on top; I just don't think it should be in Lucene's core.
          In general Lucene's core should keep things as straightforward as
          possible.

          Show
          Michael McCandless added a comment - Even as one-line its much better readable than anything else. Did you try to create a WordDelimiterFilter using it 15 argument ctor? Two minutes later you dont know anymore whet the 3rd boolean is about. In fact this is what I like about .freeze(): you invoke simple setters, one per line (usually), one object. The only reason we want immutability here is to prevent the trap of changing the FT after binding it to a Field. And freeze accomplishes this well. I agree massive single ctor isn't great; but maybe w/ EnumSet or int flags for the boolean properties it's OK. Or maybe we go back to Field.Index.X, Field.Store.Y, etc. Or stick with .freeze. Builder API can still be built out (eg in contrib or modules or google code or somewhere) on top; I just don't think it should be in Lucene's core. In general Lucene's core should keep things as straightforward as possible.
          Hide
          Chris Male added a comment -

          LUCENE-3410 for WDF improvements.

          Show
          Chris Male added a comment - LUCENE-3410 for WDF improvements.
          Hide
          Uwe Schindler added a comment -

          The only reason we want immutability here is to prevent the trap of changing the FT after binding it to a Field. And freeze accomplishes this well.

          Where is the difference to builder?

          You can also call builders one per line if you like it. I like builders especially for their readability: You can read the line and break it at any place just like a sentence. This is why the method names should look like sentence components and not setXXX() like (ideally).

          Freeze is an antipattern as you use one object for changing and then for freezing, leading to if-checks everywhere. If you make return freeze() an new immutable object, it is builder. Just without the possibility to chain.

          I dislike freeze, but if you want to do this, please add chaining, it costs you nothing as implementor.

          Show
          Uwe Schindler added a comment - The only reason we want immutability here is to prevent the trap of changing the FT after binding it to a Field. And freeze accomplishes this well. Where is the difference to builder? You can also call builders one per line if you like it. I like builders especially for their readability: You can read the line and break it at any place just like a sentence. This is why the method names should look like sentence components and not setXXX() like (ideally). Freeze is an antipattern as you use one object for changing and then for freezing, leading to if-checks everywhere. If you make return freeze() an new immutable object, it is builder. Just without the possibility to chain. I dislike freeze, but if you want to do this, please add chaining, it costs you nothing as implementor.
          Hide
          Yonik Seeley added a comment -

          Since it seems like there is no agreement on enforcing immutability, perhaps we shouldn't.
          We don't do it in a lot of other places, for example all of our query classes (and I don't think we should start).

          Rethinking the "interface" a bit... even that seems like a little overkill (and perhaps just a by-product of no one agreeing on the concrete implementation?) After all, if this is to just be a holder for parameters (like indexed, stored, etc) then allowing one to subclass doesn't add any power or even make much sense (they aren't going to change the "behavior" of anything, right?) The other normal use cases for interfaces wouldn't seem to apply to this situation either.

          Show
          Yonik Seeley added a comment - Since it seems like there is no agreement on enforcing immutability, perhaps we shouldn't. We don't do it in a lot of other places, for example all of our query classes (and I don't think we should start). Rethinking the "interface" a bit... even that seems like a little overkill (and perhaps just a by-product of no one agreeing on the concrete implementation?) After all, if this is to just be a holder for parameters (like indexed, stored, etc) then allowing one to subclass doesn't add any power or even make much sense (they aren't going to change the "behavior" of anything, right?) The other normal use cases for interfaces wouldn't seem to apply to this situation either.
          Hide
          Robert Muir added a comment -

          I agree massive single ctor isn't great; but maybe w/ EnumSet or int
          flags for the boolean properties it's OK.

          Maybe FieldType should really just be an 'int' (e.g. we dont have a class or anything) ?

          Show
          Robert Muir added a comment - I agree massive single ctor isn't great; but maybe w/ EnumSet or int flags for the boolean properties it's OK. Maybe FieldType should really just be an 'int' (e.g. we dont have a class or anything) ?
          Hide
          Uwe Schindler added a comment -

          However whether or not people agree with you on Builders and chain calls, at this stage there just isn't the support to make Builders mandatory. Yes we should create one and I'll look to you for help on that. But as a first step forward lets move FieldType over to being a get-only interface. That will leave the freezable API in there and we can then consider the next step forward. But again, I don't really see consensus on the Builder-only approach. Rather I see a lot of support for having a single ctor implementation and a builder using that.

          I would like to have an on-list vote, please. Thanks.

          Show
          Uwe Schindler added a comment - However whether or not people agree with you on Builders and chain calls, at this stage there just isn't the support to make Builders mandatory. Yes we should create one and I'll look to you for help on that. But as a first step forward lets move FieldType over to being a get-only interface. That will leave the freezable API in there and we can then consider the next step forward. But again, I don't really see consensus on the Builder-only approach. Rather I see a lot of support for having a single ctor implementation and a builder using that. I would like to have an on-list vote, please. Thanks.
          Hide
          Chris Male added a comment -

          Just when I thought we had some consensus...

          After all, if this is to just be a holder for parameters (like indexed, stored, etc) then allowing one to subclass doesn't add any power or even make much sense (they aren't going to change the "behavior" of anything, right?) The other normal use cases for interfaces wouldn't seem to apply to this situation either.

          I don't see the interface just holding a series of boolean properties. I've long favored moving Analyzer there (lets not fight that fight here) and information about numeric datatypes and docvalues types probably belong there too. I've also explored using Attributes to provide context specific extensions (more datatypes ideas, uniqueness, spatial info) that allow customized indexing of Fields.

          Equally, we have moved towards liberating the core indexer code from implementations of these concepts, starting with IndexableField and moving onto StorableField (if I ever get there). I don't see it as overkill for the indexer to rely on the interface for FieldType too.

          Show
          Chris Male added a comment - Just when I thought we had some consensus... After all, if this is to just be a holder for parameters (like indexed, stored, etc) then allowing one to subclass doesn't add any power or even make much sense (they aren't going to change the "behavior" of anything, right?) The other normal use cases for interfaces wouldn't seem to apply to this situation either. I don't see the interface just holding a series of boolean properties. I've long favored moving Analyzer there (lets not fight that fight here) and information about numeric datatypes and docvalues types probably belong there too. I've also explored using Attributes to provide context specific extensions (more datatypes ideas, uniqueness, spatial info) that allow customized indexing of Fields. Equally, we have moved towards liberating the core indexer code from implementations of these concepts, starting with IndexableField and moving onto StorableField (if I ever get there). I don't see it as overkill for the indexer to rely on the interface for FieldType too.
          Hide
          Chris Male added a comment -

          I would like to have an on-list vote, please. Thanks.

          Can you / someone else call one? I'm exhausted by the idea of summarizing this discussion.

          Show
          Chris Male added a comment - I would like to have an on-list vote, please. Thanks. Can you / someone else call one? I'm exhausted by the idea of summarizing this discussion.
          Hide
          Mark Miller added a comment -

          We should probably really let the dust settle after such a large, rapid discussion before falling to any official vote - I think that should be a last resort, and not something that we quickly drop to. To me, calling a vote means we have failed completely at building a consensus and we need to solve this the worst way possible - with one side winning and another losing. Sure, that might end up being where things go on some issues, but lets not get there within hours of a large hash out...

          Show
          Mark Miller added a comment - We should probably really let the dust settle after such a large, rapid discussion before falling to any official vote - I think that should be a last resort, and not something that we quickly drop to. To me, calling a vote means we have failed completely at building a consensus and we need to solve this the worst way possible - with one side winning and another losing. Sure, that might end up being where things go on some issues, but lets not get there within hours of a large hash out...
          Hide
          Chris Male added a comment -

          Patch which ports FieldType over to being an interface in index.* and changes document.FieldType to being document.CoreFieldType.

          Command:

          svn copy lucene/src/java/org/apache/lucene/document/FieldType.java lucene/src/java/org/apache/lucene/index/FieldType.java
          svn mv lucene/src/java/org/apache/lucene/document/FieldType.java lucene/src/java/org/apache/lucene/document/CoreFieldType.java
          

          This is just for consideration and I'm not sidestepping anyones opinions or concerns. I just want to have a concrete first step which people can discuss rather than abstract ideas.

          Show
          Chris Male added a comment - Patch which ports FieldType over to being an interface in index.* and changes document.FieldType to being document.CoreFieldType. Command: svn copy lucene/src/java/org/apache/lucene/document/FieldType.java lucene/src/java/org/apache/lucene/index/FieldType.java svn mv lucene/src/java/org/apache/lucene/document/FieldType.java lucene/src/java/org/apache/lucene/document/CoreFieldType.java This is just for consideration and I'm not sidestepping anyones opinions or concerns. I just want to have a concrete first step which people can discuss rather than abstract ideas.
          Hide
          Yonik Seeley added a comment -

          Regarding the latest patch:

          -  public Field(String name, FieldType type) {
          +  public Field(String name, CoreFieldType type) {
          

          Seems like the FieldType interface should be used for parameters and return types?

          Show
          Yonik Seeley added a comment - Regarding the latest patch: - public Field( String name, FieldType type) { + public Field( String name, CoreFieldType type) { Seems like the FieldType interface should be used for parameters and return types?
          Hide
          Chris Male added a comment -

          I would like it to be but Field ensures any FieldType passed in is frozen by calling freeze() which is a CoreFieldType notion. This is sort of messy and is a concern I have with the freezable state idea. If we removed this and let Field assume state was immutable/frozen/whatever then we could use the interface yes.

          Show
          Chris Male added a comment - I would like it to be but Field ensures any FieldType passed in is frozen by calling freeze() which is a CoreFieldType notion. This is sort of messy and is a concern I have with the freezable state idea. If we removed this and let Field assume state was immutable/frozen/whatever then we could use the interface yes.
          Hide
          Chris Male added a comment -

          Anyone else have any thoughts? Any objections to committing this patch as a first step?

          Show
          Chris Male added a comment - Anyone else have any thoughts? Any objections to committing this patch as a first step?
          Hide
          Yonik Seeley added a comment -

          Instead of introducing a dependency on CoreFieldType in many places (only to have to change it back later when some sort of consensus is finally reached), it would seem much cleaner to either

          • remove freeze() until we decide on the right approach
          • move freeze() to the FieldType interface temporarily (and remove it later if the approach changes)

          The other changes in the patch look fine.

          Show
          Yonik Seeley added a comment - Instead of introducing a dependency on CoreFieldType in many places (only to have to change it back later when some sort of consensus is finally reached), it would seem much cleaner to either remove freeze() until we decide on the right approach move freeze() to the FieldType interface temporarily (and remove it later if the approach changes) The other changes in the patch look fine.
          Hide
          Chris Male added a comment -

          Patch updated following Yonik's advice. I'd removed the freeze() calls from Field so that it can now accept a FieldType instance. If freezing is important, its up to the created of the CoreFieldType.

          Show
          Chris Male added a comment - Patch updated following Yonik's advice. I'd removed the freeze() calls from Field so that it can now accept a FieldType instance. If freezing is important, its up to the created of the CoreFieldType.
          Hide
          Michael McCandless added a comment -

          I guess it's OK to remove the auto-freeze from Field... it's sort of sneaky to do that.

          But, this means we've opened up the trap (where users change a FT after using it on a field, thinking/assuming somehow that the Field took a copy). Chris can you fix the jdocs on Field ctors to make it clear that the Field instances holds a ref to the provided FT and so any changes later made to that FT will affect the Field instance?

          Show
          Michael McCandless added a comment - I guess it's OK to remove the auto-freeze from Field... it's sort of sneaky to do that. But, this means we've opened up the trap (where users change a FT after using it on a field, thinking/assuming somehow that the Field took a copy). Chris can you fix the jdocs on Field ctors to make it clear that the Field instances holds a ref to the provided FT and so any changes later made to that FT will affect the Field instance?
          Hide
          Michael McCandless added a comment -

          Should we also move numeric(), numericDataType() and maybe
          docValuesType() into oal.index.FieldType? (We can do this as a
          speparate issue though).

          I also like Marvin's/Robert's suggestion of using int flags for all
          these booleans (also a separate issue!).

          We lost the jdocs on each of the boolean methods (indexed(), stored(),
          etc.).

          Maybe name oal.index's FT to IndexableFieldType? And then drop Core from
          oal.document's? Ie, oal.document.FieldType and
          oal.index.IndexableFieldType? (Aren't we going to shortly need
          oal.index.StorableFieldType?).

          Also fix the jdocs for CoreFT.freeze – it still claims Field will
          auto-freeze.

          Show
          Michael McCandless added a comment - Should we also move numeric(), numericDataType() and maybe docValuesType() into oal.index.FieldType? (We can do this as a speparate issue though). I also like Marvin's/Robert's suggestion of using int flags for all these booleans (also a separate issue!). We lost the jdocs on each of the boolean methods (indexed(), stored(), etc.). Maybe name oal.index's FT to IndexableFieldType? And then drop Core from oal.document's? Ie, oal.document.FieldType and oal.index.IndexableFieldType? (Aren't we going to shortly need oal.index.StorableFieldType?). Also fix the jdocs for CoreFT.freeze – it still claims Field will auto-freeze.
          Hide
          Chris Male added a comment -

          I will make the appropriate javadoc changes right now.

          Should we also move numeric(), numericDataType() and maybe
          docValuesType() into oal.index.FieldType? (We can do this as a
          speparate issue though).

          Yup.

          I also like Marvin's/Robert's suggestion of using int flags for all
          these booleans (also a separate issue!).

          I like them too. Lets do that.

          Maybe name oal.index's FT to IndexableFieldType? And then drop Core from
          oal.document's? Ie, oal.document.FieldType and
          oal.index.IndexableFieldType? (Aren't we going to shortly need
          oal.index.StorableFieldType?).

          Good idea. Its going to reduce this patch size considerably.

          Show
          Chris Male added a comment - I will make the appropriate javadoc changes right now. Should we also move numeric(), numericDataType() and maybe docValuesType() into oal.index.FieldType? (We can do this as a speparate issue though). Yup. I also like Marvin's/Robert's suggestion of using int flags for all these booleans (also a separate issue!). I like them too. Lets do that. Maybe name oal.index's FT to IndexableFieldType? And then drop Core from oal.document's? Ie, oal.document.FieldType and oal.index.IndexableFieldType? (Aren't we going to shortly need oal.index.StorableFieldType?). Good idea. Its going to reduce this patch size considerably.
          Hide
          Chris Male added a comment - - edited

          New patch based on the feedback from Mike.

          • Field now includes a class level jdoc saying its recommended no changes are made to FieldTypes after a Field is created.
          • FieldType is now IndexableFieldType and CoreFieldType has gone back to FieldType.
          • FieldType.freeze() no longer mentions auto-freezing, however it does recommend freeze() is called once properties have been set.

          We're all green so I'm looking to commit this shortly and spin off the remaining changes.

          Show
          Chris Male added a comment - - edited New patch based on the feedback from Mike. Field now includes a class level jdoc saying its recommended no changes are made to FieldTypes after a Field is created. FieldType is now IndexableFieldType and CoreFieldType has gone back to FieldType. FieldType.freeze() no longer mentions auto-freezing, however it does recommend freeze() is called once properties have been set. We're all green so I'm looking to commit this shortly and spin off the remaining changes.
          Hide
          Uwe Schindler added a comment -

          I am not green but gave up due to vacation. I am still against freeze, but my complaints are ignored.

          Show
          Uwe Schindler added a comment - I am not green but gave up due to vacation. I am still against freeze, but my complaints are ignored.
          Hide
          Chris Male added a comment -

          Far from it Uwe, your complaints are being actively taken into consideration and we have every intention to open a new issue to move away from freeze (see Mike's comments). I'm just wanting to take one step at a time.

          Show
          Chris Male added a comment - Far from it Uwe, your complaints are being actively taken into consideration and we have every intention to open a new issue to move away from freeze (see Mike's comments). I'm just wanting to take one step at a time.
          Hide
          Uwe Schindler added a comment -

          I am not green but gave up due to vacation. I am still against freeze, but my complaints are ignored.

          This sentence is too funny I don't agree and I am not happy with the whole stuff. As Simon seems to be silent, so there is nothing I can do anymore with my limited time. I still favour the builder approach, and this API looks like the old one coming back...

          Show
          Uwe Schindler added a comment - I am not green but gave up due to vacation. I am still against freeze, but my complaints are ignored. This sentence is too funny I don't agree and I am not happy with the whole stuff. As Simon seems to be silent, so there is nothing I can do anymore with my limited time. I still favour the builder approach, and this API looks like the old one coming back...
          Hide
          Chris Male added a comment -

          Final patch before committing. This includes a change to MIGRATE.txt.

          I notice we don't have a CHANGES.txt entry anywhere, so I'll add that upon committing

          Show
          Chris Male added a comment - Final patch before committing. This includes a change to MIGRATE.txt. I notice we don't have a CHANGES.txt entry anywhere, so I'll add that upon committing
          Hide
          Chris Male added a comment -

          Committed revision 1167668.

          Show
          Chris Male added a comment - Committed revision 1167668.
          Hide
          Yonik Seeley added a comment -

          We're coming up on 4.0, and it doesn't seem like there ever was a consensus here wrt immutability.
          I'm also still in favor of removing freeze.

          Show
          Yonik Seeley added a comment - We're coming up on 4.0, and it doesn't seem like there ever was a consensus here wrt immutability. I'm also still in favor of removing freeze.
          Hide
          Simon Willnauer added a comment -

          can we close this issue? seems like except of yoniks last comment everything else has been resolved?

          Show
          Simon Willnauer added a comment - can we close this issue? seems like except of yoniks last comment everything else has been resolved?
          Hide
          Hoss Man added a comment -

          bulk cleanup of 4.0-ALPHA / 4.0 Jira versioning. all bulk edited issues have hoss20120711-bulk-40-change in a comment

          Show
          Hoss Man added a comment - bulk cleanup of 4.0-ALPHA / 4.0 Jira versioning. all bulk edited issues have hoss20120711-bulk-40-change in a comment
          Hide
          Robert Muir added a comment -

          rmuir20120906-bulk-40-change

          Show
          Robert Muir added a comment - rmuir20120906-bulk-40-change
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Michael McCandless
          http://svn.apache.org/viewvc?view=revision&revision=1389535

          LUCENE-2308: add MIGRATE.txt entry about Document.setBoost

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1389535 LUCENE-2308 : add MIGRATE.txt entry about Document.setBoost
          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.

            People

            • Assignee:
              Michael McCandless
              Reporter:
              Michael McCandless
            • Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development