Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      We should use PerFieldCodecWrapper to allow users to select the codec per-field.

      1. SOLR-1942.patch
        30 kB
        Robert Muir
      2. SOLR-1942.patch
        30 kB
        Robert Muir
      3. SOLR-1942.patch
        28 kB
        Robert Muir
      4. SOLR-1942.patch
        28 kB
        Robert Muir
      5. SOLR-1942.patch
        62 kB
        Robert Muir
      6. SOLR-1942.patch
        65 kB
        Simon Willnauer
      7. SOLR-1942.patch
        71 kB
        Simon Willnauer
      8. SOLR-1942.patch
        77 kB
        Simon Willnauer
      9. SOLR-1942.patch
        65 kB
        Simon Willnauer
      10. SOLR-1942.patch
        66 kB
        Simon Willnauer
      11. SOLR-1942.patch
        66 kB
        Simon Willnauer
      12. SOLR-1942.patch
        24 kB
        Simon Willnauer

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          something to consider is wether we just want to make it configurable per field (ie: a codec="foo" attribute for the <field /> declarations) or if we want to use some general rules as well – ie: the uniqueKeyField should probably default to the PulseCodec (or something similar) ... likewise i'm guessing there are probably some defaults that would make sense per field type.

          Show
          Hoss Man added a comment - something to consider is wether we just want to make it configurable per field (ie: a codec="foo" attribute for the <field /> declarations) or if we want to use some general rules as well – ie: the uniqueKeyField should probably default to the PulseCodec (or something similar) ... likewise i'm guessing there are probably some defaults that would make sense per field type.
          Hide
          Simon Willnauer added a comment -

          I started working on this today and attach my current state. There are still some edges and there is a testcase (SpellCheckComponent) that is still failing though.

          Anyway its good enough for the first round of feedback which would be great since I am not a solr expert

          thanks

          Show
          Simon Willnauer added a comment - I started working on this today and attach my current state. There are still some edges and there is a testcase (SpellCheckComponent) that is still failing though. Anyway its good enough for the first round of feedback which would be great since I am not a solr expert thanks
          Hide
          Simon Willnauer added a comment -

          The current patch depends on the fixes in this issue I found during the work on this one.

          Show
          Simon Willnauer added a comment - The current patch depends on the fixes in this issue I found during the work on this one.
          Hide
          Simon Willnauer added a comment -

          the previous patch got somehow corrupted

          Show
          Simon Willnauer added a comment - the previous patch got somehow corrupted
          Hide
          Simon Willnauer added a comment -

          Updated the last version to support custom CodecProvider as well as custom Codecs via plugin. Currently this patch only supports specifying codecs for FieldTypes like this:

            <fieldType name="string_pulsing" class="solr.StrField" codec="Pulsing"/>
            <fieldType name="string_simpletext" class="solr.StrField" codec="SimpleText"/>
            <fieldType name="string_standard" class="solr.StrField" codec="Standard"/>
          

          and in solrconfig.xml the provider and its codecs can be configured like this: (while the class argument on codecProvider is optional)

               ....
              </deletionPolicy>
              <codecProvider class="org.apache.solr.core.MockCodecProvider" defaultCodec="Pulsing">
              	<codec class="org.apache.lucene.index.codecs.simpletext.SimpleTextCodec"/>
              	<codec class="org.apache.lucene.index.codecs.preflex.PreFlexCodec"/>
            	</codecProvider>
            </mainIndex>
          
             ....
          
          Show
          Simon Willnauer added a comment - Updated the last version to support custom CodecProvider as well as custom Codecs via plugin. Currently this patch only supports specifying codecs for FieldTypes like this: <fieldType name= "string_pulsing" class= "solr.StrField" codec= "Pulsing" /> <fieldType name= "string_simpletext" class= "solr.StrField" codec= "SimpleText" /> <fieldType name= "string_standard" class= "solr.StrField" codec= "Standard" /> and in solrconfig.xml the provider and its codecs can be configured like this: (while the class argument on codecProvider is optional) .... </deletionPolicy> <codecProvider class= "org.apache.solr.core.MockCodecProvider" defaultCodec= "Pulsing" > <codec class= "org.apache.lucene.index.codecs.simpletext.SimpleTextCodec" /> <codec class= "org.apache.lucene.index.codecs.preflex.PreFlexCodec" /> </codecProvider> </mainIndex> ....
          Hide
          Simon Willnauer added a comment -

          here is an updated patch - I think we are close here so a review would be helpful!

          Anyone, Yonik / Hoss?

          Show
          Simon Willnauer added a comment - here is an updated patch - I think we are close here so a review would be helpful! Anyone, Yonik / Hoss?
          Hide
          Yonik Seeley added a comment -

          It doesn't look like this would work with dynamic fields?

          I think we need a SolrCodecProvider that extends CodecProvider and just overrides
          getDefaultCodec() to use the schema to look up the codec instead of the internal map?
          Lucene never calls setDefaultCodec itself, right? If so, we can just ignore that method.

          Show
          Yonik Seeley added a comment - It doesn't look like this would work with dynamic fields? I think we need a SolrCodecProvider that extends CodecProvider and just overrides getDefaultCodec() to use the schema to look up the codec instead of the internal map? Lucene never calls setDefaultCodec itself, right? If so, we can just ignore that method.
          Hide
          Simon Willnauer added a comment -

          It doesn't look like this would work with dynamic fields?

          good point I didn't have them on my plate - good that you pointed that out!
          Will upload a new patch

          Show
          Simon Willnauer added a comment - It doesn't look like this would work with dynamic fields? good point I didn't have them on my plate - good that you pointed that out! Will upload a new patch
          Hide
          Simon Willnauer added a comment -

          here is a new patch that adds dynamic field support and removes the static setDefaultCodec() method completely. This method is obsolete since we have setDefaultFieldCodec now on each provider.

          simon

          Show
          Simon Willnauer added a comment - here is a new patch that adds dynamic field support and removes the static setDefaultCodec() method completely. This method is obsolete since we have setDefaultFieldCodec now on each provider. simon
          Hide
          Simon Willnauer added a comment -

          I think we are pretty close here and I would like to move forward on this issue. If nobody object I will commit in a view days.

          simon

          Show
          Simon Willnauer added a comment - I think we are pretty close here and I would like to move forward on this issue. If nobody object I will commit in a view days. simon
          Hide
          Simon Willnauer added a comment -

          updated patch to trunk

          Show
          Simon Willnauer added a comment - updated patch to trunk
          Hide
          Simon Willnauer added a comment -

          updated to trunk - if somebody has time a review would be appreciated.....

          Show
          Simon Willnauer added a comment - updated to trunk - if somebody has time a review would be appreciated.....
          Hide
          Simon Willnauer added a comment -

          updated to trunk - if somebody has time a review would be appreciated.....

          not enough interest in Solr land to review this I suppose... moving out...

          Show
          Simon Willnauer added a comment - updated to trunk - if somebody has time a review would be appreciated..... not enough interest in Solr land to review this I suppose... moving out...
          Hide
          Grant Ingersoll added a comment -

          Hey Simon,

          Any progress on this? Seems like a useful feature. I haven't had time to review, but if you feel it's ready, I say go for it.

          Show
          Grant Ingersoll added a comment - Hey Simon, Any progress on this? Seems like a useful feature. I haven't had time to review, but if you feel it's ready, I say go for it.
          Hide
          Grant Ingersoll added a comment -

          FWIW, I like the syntax of the schema and solrconfig.xml configuration that you showed in the example here. Would be good to add to the wiki once you commit it.

          Show
          Grant Ingersoll added a comment - FWIW, I like the syntax of the schema and solrconfig.xml configuration that you showed in the example here. Would be good to add to the wiki once you commit it.
          Hide
          Yonik Seeley added a comment -

          Finally got around to reviewing this again - looks good to me!
          There are some minor things like cutting down the test solrconfig so it's more minimalistic, but that can always be done after committing.

          Show
          Yonik Seeley added a comment - Finally got around to reviewing this again - looks good to me! There are some minor things like cutting down the test solrconfig so it's more minimalistic, but that can always be done after committing.
          Hide
          Robert Muir added a comment -

          any update on this? Would be nice to be able to hook in codecproviders and codecs this way.

          Show
          Robert Muir added a comment - any update on this? Would be nice to be able to hook in codecproviders and codecs this way.
          Hide
          Grant Ingersoll added a comment -

          I thought I would have time last week, but that turned out to not be the case. If you have time, Robert, feel free, otherwise I might be able to get to it later in the week (pending conf. prep). From the sounds of it, it likely just needs to be updated to trunk and then it should be ready to go (we should also doc it on the wiki)

          Show
          Grant Ingersoll added a comment - I thought I would have time last week, but that turned out to not be the case. If you have time, Robert, feel free, otherwise I might be able to get to it later in the week (pending conf. prep). From the sounds of it, it likely just needs to be updated to trunk and then it should be ready to go (we should also doc it on the wiki)
          Hide
          Robert Muir added a comment -

          ok thanks Grant. I'll take a look thru the patch some today and post back what I think.

          Show
          Robert Muir added a comment - ok thanks Grant. I'll take a look thru the patch some today and post back what I think.
          Hide
          Robert Muir added a comment -

          patch synced up to trunk

          Show
          Robert Muir added a comment - patch synced up to trunk
          Hide
          Robert Muir added a comment -

          some minor tweaks: delegate the listAll() in the schemacodecprovider, and minimize the test config

          Show
          Robert Muir added a comment - some minor tweaks: delegate the listAll() in the schemacodecprovider, and minimize the test config
          Hide
          Simon Willnauer added a comment -

          some minor comments:

          • s/nothit/nothing in // make sure we use the default if nothit is configured
          • add javadoc to CodecProvider#hasFieldCodec(String)
          • SchemaCodecProvider should maybe add its name in toString() and not just delegate
          • Maybe we should note in the CHANGES.TXT that IndexReaderFactory now has a CodecProvider that should be passed to IR#open()

          otherwise it looks good though!

          Show
          Simon Willnauer added a comment - some minor comments: s/nothit/nothing in // make sure we use the default if nothit is configured add javadoc to CodecProvider#hasFieldCodec(String) SchemaCodecProvider should maybe add its name in toString() and not just delegate Maybe we should note in the CHANGES.TXT that IndexReaderFactory now has a CodecProvider that should be passed to IR#open() otherwise it looks good though!
          Hide
          Robert Muir added a comment -

          Hi Simon,

          after reviewing the patch I have some concerns about CodecProvider. I think its a little bit confusing how the CodecProvider/CoreCodecProvider hierarchy works today, and a bit dangerous how we delegate over this class.

          For example, if we add a new method to CodecProvider, we need to be sure we add the 'delegation' here every time or stuff will start acting strange.

          For this reason, I wonder if CodecProvider should be an interface: the simple implementation we have in lucene is a hashmap, but Solr uses fieldType lookup. This would parallel how SimilarityProvider works.

          If we want to do this, I think we should open a separate issue... in fact I'm not even sure it should block this issue since in my opinion its a shame you cannot manipulate codecs in Solr right now... but I just wanted to bring it up here.

          Show
          Robert Muir added a comment - Hi Simon, after reviewing the patch I have some concerns about CodecProvider. I think its a little bit confusing how the CodecProvider/CoreCodecProvider hierarchy works today, and a bit dangerous how we delegate over this class. For example, if we add a new method to CodecProvider, we need to be sure we add the 'delegation' here every time or stuff will start acting strange. For this reason, I wonder if CodecProvider should be an interface: the simple implementation we have in lucene is a hashmap, but Solr uses fieldType lookup. This would parallel how SimilarityProvider works. If we want to do this, I think we should open a separate issue... in fact I'm not even sure it should block this issue since in my opinion its a shame you cannot manipulate codecs in Solr right now... but I just wanted to bring it up here.
          Hide
          Michael McCandless added a comment -

          I agree the CodecProvider/CoreCodecProvider is a scary potential delegation trap... Robert can you open a new issue? I agree it should not block this one.

          Show
          Michael McCandless added a comment - I agree the CodecProvider/CoreCodecProvider is a scary potential delegation trap... Robert can you open a new issue? I agree it should not block this one.
          Hide
          Robert Muir added a comment -

          OK I opened LUCENE-3124 for this

          Show
          Robert Muir added a comment - OK I opened LUCENE-3124 for this
          Hide
          Simon Willnauer added a comment -

          OK I opened LUCENE-3124 for this

          +1 thanks! good point!

          Show
          Simon Willnauer added a comment - OK I opened LUCENE-3124 for this +1 thanks! good point!
          Hide
          Robert Muir added a comment -

          Updated patch with Simon's previous suggestions.

          A few more things I saw that I'm not sure I like:

          • the CodecProvider syntax in the test config is cool, but i'm not sure this should be done in SolrCore? I think if you want to have a CP that loads up codecs by classname like this, it should be done in a CodecProviderFactory (you know parsing arguments however it wants).
          • I think its confusing how the SchemaCodecProvider answers to codec requests in 3 ways, 1. from the 'delegate' in SolrConfig, 2. from the schema, and 3. from the default codecProvider. I think if you try to use this, its easy to get yourself in a situation where solrconfig conflicts with the schema. I also don't think we need to bother with the 'defaultCP', in other words if you specify a custom codec provider, this is the only one that is used.
          Show
          Robert Muir added a comment - Updated patch with Simon's previous suggestions. A few more things I saw that I'm not sure I like: the CodecProvider syntax in the test config is cool, but i'm not sure this should be done in SolrCore? I think if you want to have a CP that loads up codecs by classname like this, it should be done in a CodecProviderFactory (you know parsing arguments however it wants). I think its confusing how the SchemaCodecProvider answers to codec requests in 3 ways, 1. from the 'delegate' in SolrConfig, 2. from the schema, and 3. from the default codecProvider. I think if you try to use this, its easy to get yourself in a situation where solrconfig conflicts with the schema. I also don't think we need to bother with the 'defaultCP', in other words if you specify a custom codec provider, this is the only one that is used.
          Hide
          Robert Muir added a comment -

          updated patch: I moved the parsing out of SolrCore, instead codecproviders just get a namedList so they can parse whatever they want (e.g. they might want more than just a list of classnames, but also codec params, ...)

          additionally i removed the interaction with the default codec provider... i think if you specify a codecprovider thats the only one that should be used directly, instead of a "union" with CodecProvider.getDefault()

          Show
          Robert Muir added a comment - updated patch: I moved the parsing out of SolrCore, instead codecproviders just get a namedList so they can parse whatever they want (e.g. they might want more than just a list of classnames, but also codec params, ...) additionally i removed the interaction with the default codec provider... i think if you specify a codecprovider thats the only one that should be used directly, instead of a "union" with CodecProvider.getDefault()
          Hide
          Simon Willnauer added a comment -

          great work robert

          here are some comments:

          • SchemaCodecProvider#listAll should also sync on the delegate to be consistent
          • CodecProviderFactory should get a jdoc string IMO
          • I think we need to fix SolrTestCase to enable the random codecs for Solr right? Or is it using the default provider if no provider is configured?
          Show
          Simon Willnauer added a comment - great work robert here are some comments: SchemaCodecProvider#listAll should also sync on the delegate to be consistent CodecProviderFactory should get a jdoc string IMO I think we need to fix SolrTestCase to enable the random codecs for Solr right? Or is it using the default provider if no provider is configured?
          Hide
          Robert Muir added a comment -

          SchemaCodecProvider#listAll should also sync on the delegate to be consistent
          CodecProviderFactory should get a jdoc string IMO

          I'll update the patch to fix both of these.

          I think we need to fix SolrTestCase to enable the random codecs for Solr right? Or is it using the default provider if no provider is configured?

          This is still working fine, in SolrCore if you do not configure a CodecProviderFactory, then we use CodecProvider.getDefault(), which is set randomly by LuceneTestCase. I added some prints to check and this is still ok.

          Show
          Robert Muir added a comment - SchemaCodecProvider#listAll should also sync on the delegate to be consistent CodecProviderFactory should get a jdoc string IMO I'll update the patch to fix both of these. I think we need to fix SolrTestCase to enable the random codecs for Solr right? Or is it using the default provider if no provider is configured? This is still working fine, in SolrCore if you do not configure a CodecProviderFactory, then we use CodecProvider.getDefault(), which is set randomly by LuceneTestCase. I added some prints to check and this is still ok.
          Hide
          Simon Willnauer added a comment -

          I'll update the patch to fix both of these.

          everything else is looking fine! +1 to commit once updated

          This is still working fine, in SolrCore if you do not configure a CodecProviderFactory, then we use CodecProvider.getDefault(), which is set randomly by LuceneTestCase. I added some prints to check and this is still ok.

          cool, thanks for ensuring that

          Show
          Simon Willnauer added a comment - I'll update the patch to fix both of these. everything else is looking fine! +1 to commit once updated This is still working fine, in SolrCore if you do not configure a CodecProviderFactory, then we use CodecProvider.getDefault(), which is set randomly by LuceneTestCase. I added some prints to check and this is still ok. cool, thanks for ensuring that
          Hide
          Robert Muir added a comment -

          Updated patch with Simon's suggestions

          Show
          Robert Muir added a comment - Updated patch with Simon's suggestions
          Hide
          Robert Muir added a comment -

          Thanks for reviewing Simon, I'll commit this soon.

          Show
          Robert Muir added a comment - Thanks for reviewing Simon, I'll commit this soon.
          Hide
          Robert Muir added a comment -

          Committed revision 1127313.

          Show
          Robert Muir added a comment - Committed revision 1127313.
          Hide
          Simon Willnauer added a comment -

          Awesome! eventually its in...

          Show
          Simon Willnauer added a comment - Awesome! eventually its in...

            People

            • Assignee:
              Robert Muir
              Reporter:
              Yonik Seeley
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development