Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2, 5.0
    • Component/s: None
    • Labels:
      None

      Description

      It would be nice if Solr supported DocValues:

      • for ID fields (fewer disk seeks when running distributed search),
      • for sorting/faceting/function queries (faster warmup time than fieldcache),
      • better on-disk and in-memory efficiency (you can use packed impls).
      1. SOLR-3855.patch
        133 kB
        Adrien Grand
      2. SOLR-3855.patch
        134 kB
        Adrien Grand
      3. SOLR-3855.patch
        57 kB
        Adrien Grand
      4. SOLR-3855.patch
        55 kB
        Adrien Grand
      5. SOLR-3855.patch
        60 kB
        Adrien Grand
      6. SOLR-3855.patch
        59 kB
        Adrien Grand
      7. SOLR-3855.patch
        100 kB
        Adrien Grand
      8. SOLR-3855.patch
        115 kB
        Adrien Grand
      9. SOLR-3855-2.patch
        13 kB
        Adrien Grand

        Issue Links

          Activity

          Hide
          Adrien Grand added a comment -

          I have planned to start working on this when 4.0 is released.

          Show
          Adrien Grand added a comment - I have planned to start working on this when 4.0 is released.
          Hide
          Adrien Grand added a comment -

          Initial patch:

          DocValues can be:

          • configured on a per-field-type basis (docValueType=...),
          • enabled on a per-field basis (docValues=true/false)

          and are available for the following field types:

          • StrField,
          • UUIDField,
          • Trie*Field,
          • BoolField.

          When doc values are enabled, they have precedence over the field cache for getValueSource and getSortField, however faceting and stats cannot use doc values yet (I would like to do this as a separate issue).

          I force fields that have doc values enabled to be single-valued and to be either required or have a default value.

          I also modified a lot of code (ReturnFields especially) to make DocValues behave like stored fields. I think this would be great for ID fields. In a cluster that has numShards shards, it would help decrease the number of disk seeks in the .fdt file (which is often too big to fit entirely in the OS cache) per request from (numShards * (start + rows) + rows) to rows.

          The patch is not committable yet, and I have BasicDistributedZkTest.testDistribSearch that always fails (not sure why yet...) but I'd love to have some feedback to know whether it is going in the right direction.

          Show
          Adrien Grand added a comment - Initial patch: DocValues can be: configured on a per-field-type basis (docValueType=...), enabled on a per-field basis (docValues=true/false) and are available for the following field types: StrField, UUIDField, Trie*Field, BoolField. When doc values are enabled, they have precedence over the field cache for getValueSource and getSortField , however faceting and stats cannot use doc values yet (I would like to do this as a separate issue). I force fields that have doc values enabled to be single-valued and to be either required or have a default value. I also modified a lot of code (ReturnFields especially) to make DocValues behave like stored fields. I think this would be great for ID fields. In a cluster that has numShards shards, it would help decrease the number of disk seeks in the .fdt file (which is often too big to fit entirely in the OS cache) per request from (numShards * (start + rows) + rows) to rows . The patch is not committable yet, and I have BasicDistributedZkTest.testDistribSearch that always fails (not sure why yet...) but I'd love to have some feedback to know whether it is going in the right direction.
          Hide
          Robert Muir added a comment -

          warning: just skimmed the patch.

          configured on a per-field-type basis (docValueType=...),
          enabled on a per-field basis (docValues=true/false)

          We could combine these? e.g. a docValueType of "none" or something? This would parallel the lucene apis and maybe make things a bit simpler.

          When doc values are enabled, they have precedence over the field cache for getValueSource and getSortField, however faceting and stats cannot use doc values yet (I would like to do this as a separate issue).

          Ultimately it would be really great if fieldcache and docvalues had the same API. I worry about the fact that its not this way currently. This shouldn't block this patch, its just a semi-related discussion... seems like fieldcache should be presented as "build docvalues on the fly for the field".

          Would be awesome if faceting etc could use docvalues: though I think there is likely some work for the multivalued case? e.g. we would have to encode multiple tokens at a level above into the single-valued StraightBytes or whatever ala DocTermOrds? or maybe we should think about an actual type for this that can allow for more efficient impls?

          I also modified a lot of code (ReturnFields especially) to make DocValues behave like stored fields. I think this would be great for ID fields. In a cluster that has numShards shards, it would help decrease the number of disk seeks in the .fdt file (which is often too big to fit entirely in the OS cache) per request from (numShards * (start + rows) + rows) to rows.

          I didn't look at this part, but is this really true? its numFields * rows right? If its some special case for ID fields where #idfields=1 for distributed search or whatever, I think thats a good optimization for that use-case. But in general if docvalues are presented like stored fields for general purposes I think thats not a great illusion to give to the user in case they have a lot of fields?

          Thanks for getting this started!

          Show
          Robert Muir added a comment - warning: just skimmed the patch. configured on a per-field-type basis (docValueType=...), enabled on a per-field basis (docValues=true/false) We could combine these? e.g. a docValueType of "none" or something? This would parallel the lucene apis and maybe make things a bit simpler. When doc values are enabled, they have precedence over the field cache for getValueSource and getSortField, however faceting and stats cannot use doc values yet (I would like to do this as a separate issue). Ultimately it would be really great if fieldcache and docvalues had the same API. I worry about the fact that its not this way currently. This shouldn't block this patch, its just a semi-related discussion... seems like fieldcache should be presented as "build docvalues on the fly for the field". Would be awesome if faceting etc could use docvalues: though I think there is likely some work for the multivalued case? e.g. we would have to encode multiple tokens at a level above into the single-valued StraightBytes or whatever ala DocTermOrds? or maybe we should think about an actual type for this that can allow for more efficient impls? I also modified a lot of code (ReturnFields especially) to make DocValues behave like stored fields. I think this would be great for ID fields. In a cluster that has numShards shards, it would help decrease the number of disk seeks in the .fdt file (which is often too big to fit entirely in the OS cache) per request from (numShards * (start + rows) + rows) to rows. I didn't look at this part, but is this really true? its numFields * rows right? If its some special case for ID fields where #idfields=1 for distributed search or whatever, I think thats a good optimization for that use-case. But in general if docvalues are presented like stored fields for general purposes I think thats not a great illusion to give to the user in case they have a lot of fields? Thanks for getting this started!
          Hide
          Adrien Grand added a comment -

          We could combine these? e.g. a docValueType of "none" or something? This would parallel the lucene apis and maybe make things a bit simpler.

          Good point.

          Additionally I currently force doc values to be non-direct (ie. in-memory). Do you think it is fine or should we give people the choice? I wasn't sure when writing the patch because I think they would provide irregular performance depending on the good will of the I/O cache (I was thinking of people benchmarking with a read-only index, then going into production and performing a sort on a large result set while a background merge is running (eating all the I/O cache memory) and BOOM!). But maybe I'm too pessimistic.

          it would be really great if fieldcache and docvalues had the same API

          Yes it would make things so much easier... I also wish DocValues.Source and FunctionValues were the same class.

          Would be awesome if faceting etc could use docvalues: though I think there is likely some work for the multivalued case?

          Right, DocValues faceting has its own challenges. But that's clearly an issue where merging fieldcache, DocValues.Source and FunctionValues would make things easier : we would have only one code base that is independant from the source of "values" and SOLR-1581 would almost come free.

          I didn't look at this part, but is this really true? its numFields * rows right?

          I was thinking of non-direct doc values for ID fields. Correct me if I'm wrong but when doing a distributed search:

          1. createMainQuery: Solr first asks every shard for the IDs of the best (start + rows) docs
          2. createRetrieveDocs: Solr selects the rows IDs of documents to display and asks the shards they are stored on for their stored fields

          So step 1 requires (start + rows) seeks in the FDT file per shard (to know their IDs) and step 2 requires rows seeks overall. So the total is (numShards * (start + rows)) + rows. If we stored document IDs in memory I think this could help reduce this number to rows (only the second step), which would be great, especially for deep paging or large number of shards.

          But in general if docvalues are presented like stored fields for general purposes I think thats not a great illusion to give to the user in case they have a lot of fields?

          Of course it makes no sense to store all fields in DocValues, I think they are best used for ID fields, sorting, scoring factors (function queries) and (soon ) faceting. I wanted them to behave like stored fields so that users don't make their fields stored in addition to DocValues for convenience (this is a waste of space, and the bigger the FDT file is, the more likely the I/O cache can't serve disk seeks in this file).

          Show
          Adrien Grand added a comment - We could combine these? e.g. a docValueType of "none" or something? This would parallel the lucene apis and maybe make things a bit simpler. Good point. Additionally I currently force doc values to be non-direct (ie. in-memory). Do you think it is fine or should we give people the choice? I wasn't sure when writing the patch because I think they would provide irregular performance depending on the good will of the I/O cache (I was thinking of people benchmarking with a read-only index, then going into production and performing a sort on a large result set while a background merge is running (eating all the I/O cache memory) and BOOM!). But maybe I'm too pessimistic. it would be really great if fieldcache and docvalues had the same API Yes it would make things so much easier... I also wish DocValues.Source and FunctionValues were the same class. Would be awesome if faceting etc could use docvalues: though I think there is likely some work for the multivalued case? Right, DocValues faceting has its own challenges. But that's clearly an issue where merging fieldcache, DocValues.Source and FunctionValues would make things easier : we would have only one code base that is independant from the source of "values" and SOLR-1581 would almost come free. I didn't look at this part, but is this really true? its numFields * rows right? I was thinking of non-direct doc values for ID fields. Correct me if I'm wrong but when doing a distributed search: 1. createMainQuery: Solr first asks every shard for the IDs of the best (start + rows) docs 2. createRetrieveDocs: Solr selects the rows IDs of documents to display and asks the shards they are stored on for their stored fields So step 1 requires (start + rows) seeks in the FDT file per shard (to know their IDs) and step 2 requires rows seeks overall. So the total is (numShards * (start + rows)) + rows . If we stored document IDs in memory I think this could help reduce this number to rows (only the second step), which would be great, especially for deep paging or large number of shards. But in general if docvalues are presented like stored fields for general purposes I think thats not a great illusion to give to the user in case they have a lot of fields? Of course it makes no sense to store all fields in DocValues, I think they are best used for ID fields, sorting, scoring factors (function queries) and (soon ) faceting. I wanted them to behave like stored fields so that users don't make their fields stored in addition to DocValues for convenience (this is a waste of space, and the bigger the FDT file is, the more likely the I/O cache can't serve disk seeks in this file).
          Hide
          Robert Muir added a comment -

          Additionally I currently force doc values to be non-direct (ie. in-memory). Do you think it is fine or should we give people the choice?

          I think we should provide direct: maybe even by default? I guess the exception would be things like sorted bytes (I think you cannot sort direct?). But besides that, this is why I raised questions about "acts like stored fields".

          I was thinking of non-direct doc values for ID fields.

          I was describing direct

          Show
          Robert Muir added a comment - Additionally I currently force doc values to be non-direct (ie. in-memory). Do you think it is fine or should we give people the choice? I think we should provide direct: maybe even by default? I guess the exception would be things like sorted bytes (I think you cannot sort direct?). But besides that, this is why I raised questions about "acts like stored fields". I was thinking of non-direct doc values for ID fields. I was describing direct
          Hide
          Adrien Grand added a comment -

          direct: maybe even by default?

          What is your motivation to make direct the default? I understand that is requires less memory but at the same time I'm worried that it would completely depend on the I/O cache for performance although it could be used on performance-critical paths (sorting, faceting, scoring factors...).

          Show
          Adrien Grand added a comment - direct: maybe even by default? What is your motivation to make direct the default? I understand that is requires less memory but at the same time I'm worried that it would completely depend on the I/O cache for performance although it could be used on performance-critical paths (sorting, faceting, scoring factors...).
          Hide
          Robert Muir added a comment -

          Actually another idea is maybe there should be no default? And it wouldnt be too bad if we combine docValuesType and docValues into one parameter.

          So this way there is no trap (slow-io, or huge ram) on anyone. you would just say docValuesType=packed docValuesMethod=(disk/memory?)

          Then down the road once docvalues is really worked out, we could have example fieldtypes for various use cases (id, sort, scoring factors, etc) configured reasonably with whatever makes sense?

          I guess it doesnt matter to me if there is a default, and what the default is, as long as we are careful to both offer and advertise the ram/disk option in the example (even if there is actually a default).

          for this initial patch, we could just make it so only "memory" actually works and you get a UOE if you ask for direct. I feel like this is a good step to remove uninversion times, etc and we could separately add the disk option... I'm just throwing out future ideas.

          Show
          Robert Muir added a comment - Actually another idea is maybe there should be no default? And it wouldnt be too bad if we combine docValuesType and docValues into one parameter. So this way there is no trap (slow-io, or huge ram) on anyone. you would just say docValuesType=packed docValuesMethod=(disk/memory?) Then down the road once docvalues is really worked out, we could have example fieldtypes for various use cases (id, sort, scoring factors, etc) configured reasonably with whatever makes sense? I guess it doesnt matter to me if there is a default, and what the default is, as long as we are careful to both offer and advertise the ram/disk option in the example (even if there is actually a default). for this initial patch, we could just make it so only "memory" actually works and you get a UOE if you ask for direct. I feel like this is a good step to remove uninversion times, etc and we could separately add the disk option... I'm just throwing out future ideas.
          Hide
          Adrien Grand added a comment -

          We could combine these? e.g. a docValueType of "none" or something?

          Something I like about having two different parameters is that it gives the ability to specify a default DocValues.Type (used when docValues is true and docValueType is unset). For example, with some FieldTypes only one DocValues.Type makes sense (UUIDField -> FIXED_STRAIGHT) and it would make sense to make FIXED_INTS_32 the default docValueType for TrieIntField.

          Additionally, maybe that not having to learn about all DocValues types before using them would make their adoption smoother?

          So this way there is no trap (slow-io, or huge ram) on anyone.

          I like this argument (forcing the trade-off to be explicit)! But this makes me want to merge DocValues activation with the method instead of the type. For example we could say docValues=no|disk|memory (optional, defaults to "no", "disk" to enable direct doc values, "memory" otherwise) docValueType=$

          {type}

          (optional, default value depends on the FieldType, only taken into account when "docValues" is set and is not "no").

          I don't like the fact that I needed to add createDocValues method in addition to the createField method, but only poly fields can return several fields. We should probably fix the API (maybe something like normal fields must have at most one indexed field but should be able to produce several stored fields?) but I'd rather do it in a different issue.

          for this initial patch, we could just make it so only "memory" actually works and you get a UOE if you ask for direct. I feel like this is a good step to remove uninversion times, etc and we could separately add the disk option...

          Adding support for direct doc values should be easy: DocValues value sources already support direct DocValues, we just need to fix oal.search.FieldComparator implementations to have an option to use direct DocValues (they currently force in-memory DocValues).

          Show
          Adrien Grand added a comment - We could combine these? e.g. a docValueType of "none" or something? Something I like about having two different parameters is that it gives the ability to specify a default DocValues.Type (used when docValues is true and docValueType is unset). For example, with some FieldTypes only one DocValues.Type makes sense (UUIDField -> FIXED_STRAIGHT) and it would make sense to make FIXED_INTS_32 the default docValueType for TrieIntField. Additionally, maybe that not having to learn about all DocValues types before using them would make their adoption smoother? So this way there is no trap (slow-io, or huge ram) on anyone. I like this argument (forcing the trade-off to be explicit)! But this makes me want to merge DocValues activation with the method instead of the type. For example we could say docValues=no|disk|memory (optional, defaults to "no", "disk" to enable direct doc values, "memory" otherwise) docValueType=$ {type} (optional, default value depends on the FieldType, only taken into account when "docValues" is set and is not "no"). I don't like the fact that I needed to add createDocValues method in addition to the createField method, but only poly fields can return several fields. We should probably fix the API (maybe something like normal fields must have at most one indexed field but should be able to produce several stored fields?) but I'd rather do it in a different issue. for this initial patch, we could just make it so only "memory" actually works and you get a UOE if you ask for direct. I feel like this is a good step to remove uninversion times, etc and we could separately add the disk option... Adding support for direct doc values should be easy: DocValues value sources already support direct DocValues, we just need to fix oal.search.FieldComparator implementations to have an option to use direct DocValues (they currently force in-memory DocValues).
          Hide
          Robert Muir added a comment -

          and it would make sense to make FIXED_INTS_32 the default docValueType for TrieIntField.

          Are you sure only one thing makes sense? What if i need integers that are larger than a short, but the range of values (max-min)
          is actually small. Then a Packed impl could make more sense. So we should think about this...

          Additionally, maybe that not having to learn about all DocValues types before using them would make their adoption smoother?

          Well I don't think there should be so many types

          There is a big todo about this in DocValues.java.

          In my opinion instead of IndexWriter streaming docvalues to the codec directly, only to have the codec buffer up in ram and use
          Counter for accounting, IndexWriter should buffer and things like STRAIGHT/VAR would just be optimizations...

          I guess i think the same as for ints, just like if you asked for packed and its going to need 64 bits, its implemented as that
          behind the scenes (but then still "pretends" to be a packed field, which is wierd!).

          But this is a little off-topic

          I like this argument (forcing the trade-off to be explicit)! But this makes me want to merge DocValues activation with the method instead of the type. For example we could say docValues=no|disk|memory (optional, defaults to "no", "disk" to enable direct doc values, "memory" otherwise) docValueType=${type} (optional, default value depends on the FieldType, only taken into account when "docValues" is set and is not "no").

          I think this is good!

          I wanted them to behave like stored fields so that users don't make their fields stored in addition to DocValues for convenience (this is a waste of space, and the bigger the FDT file is, the more likely the I/O cache can't serve disk seeks in this file).

          I'm still worried about this case: I don't like them treated as stored fields. Its only going to be more seeks if people have disk-enabled dvs
          that we must fetch in addition to the stored fields.

          I havent looked at the relevant bits, but is it possible we could treat "*" as just meaning the stored fields still? Basically, if you CHOOSE to
          request them, you get them, but we don't do anything trappy.

          Show
          Robert Muir added a comment - and it would make sense to make FIXED_INTS_32 the default docValueType for TrieIntField. Are you sure only one thing makes sense? What if i need integers that are larger than a short, but the range of values (max-min) is actually small. Then a Packed impl could make more sense. So we should think about this... Additionally, maybe that not having to learn about all DocValues types before using them would make their adoption smoother? Well I don't think there should be so many types There is a big todo about this in DocValues.java. In my opinion instead of IndexWriter streaming docvalues to the codec directly, only to have the codec buffer up in ram and use Counter for accounting, IndexWriter should buffer and things like STRAIGHT/VAR would just be optimizations... I guess i think the same as for ints, just like if you asked for packed and its going to need 64 bits, its implemented as that behind the scenes (but then still "pretends" to be a packed field, which is wierd!). But this is a little off-topic I like this argument (forcing the trade-off to be explicit)! But this makes me want to merge DocValues activation with the method instead of the type. For example we could say docValues=no|disk|memory (optional, defaults to "no", "disk" to enable direct doc values, "memory" otherwise) docValueType=${type} (optional, default value depends on the FieldType, only taken into account when "docValues" is set and is not "no"). I think this is good! I wanted them to behave like stored fields so that users don't make their fields stored in addition to DocValues for convenience (this is a waste of space, and the bigger the FDT file is, the more likely the I/O cache can't serve disk seeks in this file). I'm still worried about this case: I don't like them treated as stored fields. Its only going to be more seeks if people have disk-enabled dvs that we must fetch in addition to the stored fields. I havent looked at the relevant bits, but is it possible we could treat "*" as just meaning the stored fields still? Basically, if you CHOOSE to request them, you get them, but we don't do anything trappy.
          Hide
          Adrien Grand added a comment -

          Are you sure only one thing makes sense? What if i need integers that are larger than a short, but the range of values (max-min)

          is actually small. Then a Packed impl could make more sense. So we should think about this...

          I understand your point, I am myself a big supporter of packed ints and plan to use them probably more often than fixed ints, but I still think that fixed_ints would be a good default (no one would be surprised if the doc values of a field which is an int in their schema require 4 bytes per value).

          But if Lucene was able to switch automatically from packed ints to fixed_ints if they have less than x% overhead, this would be great!

          Well I don't think there should be so many types

          If you want to sort on a String field, there are 6 available types. And I think it should be easy for people getting started with Solr to do simple things such as sorting data without having to understand the different trade-offs of these doc values types in order to choose one. Otherwise the risk is that they keep using the field cache instead because they find it more convenient.

          (I hate this argument because some people will certainly have troubles with SORTED doc values on a unique field of a very large index, but anyway it is still better than the field cache?)

          In my opinion instead of IndexWriter streaming docvalues to the codec directly, only to have the codec buffer up in ram and use

          Counter for accounting, IndexWriter should buffer and things like STRAIGHT/VAR would just be optimizations...

          +1

          I'm still worried about this case: I don't like them treated as stored fields. Its only going to be more seeks if people have disk-enabled dvs that we must fetch in addition to the stored fields.
          I havent looked at the relevant bits, but is it possible we could treat "*" as just meaning the stored fields still? Basically, if you CHOOSE to
          request them, you get them, but we don't do anything trappy.

          If we allow for direct doc values, this makes sense to not load them by default, but I think we should add documentation to the example schema.xml so that people know that it is wasteful to store fields if doc values are enabled and in memory, and that they can be added very easily to the response by adding the field name to the fl parameter.

          In case the unique key has doc values and is not stored, maybe it still makes sense to fetch it when fl=*?

          Show
          Adrien Grand added a comment - Are you sure only one thing makes sense? What if i need integers that are larger than a short, but the range of values (max-min) is actually small. Then a Packed impl could make more sense. So we should think about this... I understand your point, I am myself a big supporter of packed ints and plan to use them probably more often than fixed ints, but I still think that fixed_ints would be a good default (no one would be surprised if the doc values of a field which is an int in their schema require 4 bytes per value). But if Lucene was able to switch automatically from packed ints to fixed_ints if they have less than x% overhead, this would be great! Well I don't think there should be so many types If you want to sort on a String field, there are 6 available types. And I think it should be easy for people getting started with Solr to do simple things such as sorting data without having to understand the different trade-offs of these doc values types in order to choose one. Otherwise the risk is that they keep using the field cache instead because they find it more convenient. (I hate this argument because some people will certainly have troubles with SORTED doc values on a unique field of a very large index, but anyway it is still better than the field cache?) In my opinion instead of IndexWriter streaming docvalues to the codec directly, only to have the codec buffer up in ram and use Counter for accounting, IndexWriter should buffer and things like STRAIGHT/VAR would just be optimizations... +1 I'm still worried about this case: I don't like them treated as stored fields. Its only going to be more seeks if people have disk-enabled dvs that we must fetch in addition to the stored fields. I havent looked at the relevant bits, but is it possible we could treat "*" as just meaning the stored fields still? Basically, if you CHOOSE to request them, you get them, but we don't do anything trappy. If we allow for direct doc values, this makes sense to not load them by default, but I think we should add documentation to the example schema.xml so that people know that it is wasteful to store fields if doc values are enabled and in memory, and that they can be added very easily to the response by adding the field name to the fl parameter. In case the unique key has doc values and is not stored, maybe it still makes sense to fetch it when fl=*?
          Hide
          Robert Muir added a comment -

          If we allow for direct doc values, this makes sense to not load them by default, but I think we should add documentation to the example schema.xml so that people know that it is wasteful to store fields if doc values are enabled and in memory, and that they can be added very easily to the response by adding the field name to the fl parameter.

          just as wasteful as your example above: adding DocValues to a TrieIntField right? By adding docvalues, its implying that its a single-valued field, and you could do a DocValuesRangeQuery instead (works just like FieldCacheRangeQuery), so why invert it too: wasting space in the postings lists and term dictionary?

          Show
          Robert Muir added a comment - If we allow for direct doc values, this makes sense to not load them by default, but I think we should add documentation to the example schema.xml so that people know that it is wasteful to store fields if doc values are enabled and in memory, and that they can be added very easily to the response by adding the field name to the fl parameter. just as wasteful as your example above: adding DocValues to a TrieIntField right? By adding docvalues, its implying that its a single-valued field, and you could do a DocValuesRangeQuery instead (works just like FieldCacheRangeQuery), so why invert it too: wasting space in the postings lists and term dictionary?
          Hide
          Adrien Grand added a comment -

          New patch:

          • ability to have direct doc values,
          • doc values are not fetched by default, you need to explicitely add their name to the fl parameter to load them,
          • all tests pass except BasicDistributedZkTest.testDistribSearch, but it doesn't pass either without the patch applied on my (very slow...) laptop.

          This patch is not perfect... for example I am not happy that I had to add a new createDocValuesFields method in FieldType. The reason is that only poly fields are allowed to return several fields in createFields but I think this would require a more globabl refactoring and should not block this issue?

          If you want to play with doc values and Solr, I modified the example schema.xml so that popularity and inStock have doc values enabled. You can try to display their values, sort on them and/or use function queries on them.

          When a field is indexed and has doc values, the patch always tries to use doc values instead of the field cache.

          Show
          Adrien Grand added a comment - New patch: ability to have direct doc values, doc values are not fetched by default, you need to explicitely add their name to the fl parameter to load them, all tests pass except BasicDistributedZkTest.testDistribSearch, but it doesn't pass either without the patch applied on my (very slow...) laptop. This patch is not perfect... for example I am not happy that I had to add a new createDocValuesFields method in FieldType. The reason is that only poly fields are allowed to return several fields in createFields but I think this would require a more globabl refactoring and should not block this issue? If you want to play with doc values and Solr, I modified the example schema.xml so that popularity and inStock have doc values enabled. You can try to display their values, sort on them and/or use function queries on them. When a field is indexed and has doc values, the patch always tries to use doc values instead of the field cache.
          Hide
          Yonik Seeley added a comment -

          Great stuff Adrien!

          I think we should model the docvalues as stored fields (that's how I always planned on doing it - but never got around to it). fl=* should still return these fields. Think about optimistic concurrency, etc. Having to know all of the field names to actually get all of the field values is not a good thing.

          I wonder if we could pick a better name? "doc values" isn't very descriptive at a higher abstract level. I always considered CSF/DocValues to be about storing the values separately for better caching by the OS. "stored separately" seems closer to the real description. One thing to consider is that even if Lucene changes the names of the methods later, we try to stick with external APIs longer in Solr - hence it can be less important that the names exactly match what is in lucene and more important that they are something we want for the long haul.

          Show
          Yonik Seeley added a comment - Great stuff Adrien! I think we should model the docvalues as stored fields (that's how I always planned on doing it - but never got around to it). fl=* should still return these fields. Think about optimistic concurrency, etc. Having to know all of the field names to actually get all of the field values is not a good thing. I wonder if we could pick a better name? "doc values" isn't very descriptive at a higher abstract level. I always considered CSF/DocValues to be about storing the values separately for better caching by the OS. "stored separately" seems closer to the real description. One thing to consider is that even if Lucene changes the names of the methods later, we try to stick with external APIs longer in Solr - hence it can be less important that the names exactly match what is in lucene and more important that they are something we want for the long haul.
          Hide
          Robert Muir added a comment -

          DocValues aren't stored fields though: especially when kept on disk.

          its a bad idea to mislead users into thinking they are the same.

          Show
          Robert Muir added a comment - DocValues aren't stored fields though: especially when kept on disk. its a bad idea to mislead users into thinking they are the same.
          Hide
          Adrien Grand added a comment -

          Having to know all of the field names to actually get all of the field values is not a good thing.

          Good point. I should rework this part of the patch.

          I think we should model the docvalues as stored fields (that's how I always planned on doing it - but never got around to it). fl=* should still return these fields.

          This was my first idea too, but at this time I forced doc values to be memory-resident. I think Robert's point to not fetch doc values automatically because they could potentially imply a crazy number of random disk seeks makes sense.

          A trade-off could be to only fetch automatically memory-resident doc values but I think it would be very confusing for users.

          I wonder if we could pick a better name?

          I used this name because it is the name in the Lucene API and because this feature has been "marketed" with this name in various blog posts and conference talks. Do you have an idea for another name?

          Show
          Adrien Grand added a comment - Having to know all of the field names to actually get all of the field values is not a good thing. Good point. I should rework this part of the patch. I think we should model the docvalues as stored fields (that's how I always planned on doing it - but never got around to it). fl=* should still return these fields. This was my first idea too, but at this time I forced doc values to be memory-resident. I think Robert's point to not fetch doc values automatically because they could potentially imply a crazy number of random disk seeks makes sense. A trade-off could be to only fetch automatically memory-resident doc values but I think it would be very confusing for users. I wonder if we could pick a better name? I used this name because it is the name in the Lucene API and because this feature has been "marketed" with this name in various blog posts and conference talks. Do you have an idea for another name?
          Hide
          Yonik Seeley added a comment -

          Regarding performance - it seems like for most users, the number of docvalue fields should be relatively small.
          One of the big advantages to DocValues is the better caching by the OS - so "seeks" should often never hit the disk.
          For those users where performance is a concern, they should set "fl" to retrieve only those fields they absolutely need.
          Also consider existing working clients where the solr server changes the storage type of the field for better performance - that shouldn't be visible to the client (just as changing the precisionStep of a trie type should not be visible).

          Naming:
          At some time in the past I was considering storeSep=true/false (until there were multiple ways to store separately), but I was never crazy about the name. But since we're just trying to say how the field should be stored, perhaps just overload that parameter?

          stored=true // same as today
          stored=[docValues method] // store separately using the given method

          I'm not sold on it or anything... just throwing out ideas.

          I like the separate param for "disk"/"memory" or "direct"/"memory" - the default access method for the field really is different from how it's stored.
          But it seems like that should just be a default and one should be able to access the field via direct or memory depending on the situation?
          For simply adding additional return fields, direct seems the right approach, unless it's already been loaded into memory, in which case it would be a nice optimization to use that.

          Show
          Yonik Seeley added a comment - Regarding performance - it seems like for most users, the number of docvalue fields should be relatively small. One of the big advantages to DocValues is the better caching by the OS - so "seeks" should often never hit the disk. For those users where performance is a concern, they should set "fl" to retrieve only those fields they absolutely need. Also consider existing working clients where the solr server changes the storage type of the field for better performance - that shouldn't be visible to the client (just as changing the precisionStep of a trie type should not be visible). Naming: At some time in the past I was considering storeSep=true/false (until there were multiple ways to store separately), but I was never crazy about the name. But since we're just trying to say how the field should be stored, perhaps just overload that parameter? stored=true // same as today stored= [docValues method] // store separately using the given method I'm not sold on it or anything... just throwing out ideas. I like the separate param for "disk"/"memory" or "direct"/"memory" - the default access method for the field really is different from how it's stored. But it seems like that should just be a default and one should be able to access the field via direct or memory depending on the situation? For simply adding additional return fields, direct seems the right approach, unless it's already been loaded into memory, in which case it would be a nice optimization to use that.
          Hide
          Adrien Grand added a comment -

          Regarding performance - it seems like for most users, the number of docvalue fields should be relatively small.
          One of the big advantages to DocValues is the better caching by the OS - so "seeks" should often never hit the disk.

          I agree that it is unlikely to affect performance for many users but on the other hand I don't like the fact that Solr could suddenly get insanely slow if doc values fields grow larger than the size of the I/O cache.

          stored=[docValues method] // store separately using the given method

          I'm afraid it could be confusing for users: doc values are very different from stored fields feature-wise (sorting, function values) and performance-wise (up to 1 seek per doc vs. up to 1 seek per field) so I think we should use another parameter name?

          But it seems like that should just be a default and one should be able to access the field via direct or memory depending on the situation?

          To avoid surprises (OOM on the one hand / extreme slowness on the other hand) I think we should stick to an explicit access method specified in the schema? (I've planned to fix SortField/FieldComparator so that it doesn't force doc values to be memory-resident when sorting.)

          The question of loading or not doc values fields by default seems to raise lots of concerns. Maybe we should fix this issue with no promise that doc values fields would be loaded by default and open another issue to find out whether it is reasonable or not to do so? (I'm just afraid that consensus might be hard to obtain while everyone seems to agree that DocValues support is an improvement?)

          Show
          Adrien Grand added a comment - Regarding performance - it seems like for most users, the number of docvalue fields should be relatively small. One of the big advantages to DocValues is the better caching by the OS - so "seeks" should often never hit the disk. I agree that it is unlikely to affect performance for many users but on the other hand I don't like the fact that Solr could suddenly get insanely slow if doc values fields grow larger than the size of the I/O cache. stored= [docValues method] // store separately using the given method I'm afraid it could be confusing for users: doc values are very different from stored fields feature-wise (sorting, function values) and performance-wise (up to 1 seek per doc vs. up to 1 seek per field) so I think we should use another parameter name? But it seems like that should just be a default and one should be able to access the field via direct or memory depending on the situation? To avoid surprises (OOM on the one hand / extreme slowness on the other hand) I think we should stick to an explicit access method specified in the schema? (I've planned to fix SortField/FieldComparator so that it doesn't force doc values to be memory-resident when sorting.) The question of loading or not doc values fields by default seems to raise lots of concerns. Maybe we should fix this issue with no promise that doc values fields would be loaded by default and open another issue to find out whether it is reasonable or not to do so? (I'm just afraid that consensus might be hard to obtain while everyone seems to agree that DocValues support is an improvement?)
          Hide
          Yonik Seeley added a comment -

          FYI, Adrien & I chatted a while about this (we're both at ApacheCon now), and came to agreement with roughly what we think addresses both of our concerns. He'll add the details here when he gets a chance.

          Show
          Yonik Seeley added a comment - FYI, Adrien & I chatted a while about this (we're both at ApacheCon now), and came to agreement with roughly what we think addresses both of our concerns. He'll add the details here when he gets a chance.
          Hide
          Adrien Grand added a comment -

          Here are the details:

          • stored fields and doc values are considered very different features (with different parameter names in the schema to enable/disable them)
          • fl=* would only load stored fields by default (so fields that have doc values would also need to be stored if you want them to be loaded with fl=*),
          • the fl parameter can accept field names that have doc values but are not stored (and this would work as expected),
          • DocValues would be useful for documents' unique keys and version numbers (because having fast random access to these fields is important)
          • if the fl parameter only contains one field that has both doc values and stored fields enabled, it makes sense to use doc values (same number of disk seeks in the worst case). Otherwise, it should stick to stored fields by default. It might make sense to add an option to force doc values with several fields in the field list (fl.docValues=true?) but we should add appropriate warnings about it in the docs.

          I'll update my patch soon. Yonik, please correct me if I'm wrong!

          Show
          Adrien Grand added a comment - Here are the details: stored fields and doc values are considered very different features (with different parameter names in the schema to enable/disable them) fl=* would only load stored fields by default (so fields that have doc values would also need to be stored if you want them to be loaded with fl=*), the fl parameter can accept field names that have doc values but are not stored (and this would work as expected), DocValues would be useful for documents' unique keys and version numbers (because having fast random access to these fields is important) if the fl parameter only contains one field that has both doc values and stored fields enabled, it makes sense to use doc values (same number of disk seeks in the worst case). Otherwise, it should stick to stored fields by default. It might make sense to add an option to force doc values with several fields in the field list (fl.docValues=true?) but we should add appropriate warnings about it in the docs. I'll update my patch soon. Yonik, please correct me if I'm wrong!
          Hide
          Adrien Grand added a comment -

          The DocValues API might change a lot because of LUCENE-4547, I'll wait for it to reintegrate trunk before working again on this issue (but please give me enough time so that this issue can make it into 4.1).

          Show
          Adrien Grand added a comment - The DocValues API might change a lot because of LUCENE-4547 , I'll wait for it to reintegrate trunk before working again on this issue (but please give me enough time so that this issue can make it into 4.1).
          Hide
          Yonik Seeley added a comment -

          stored fields and doc values are considered very different features

          I think we see things a bit differently at the conceptual level - but it currently amounts to the same concrete decisions (i.e. I agree with all your other list items).
          stored=true says that a field is conceptualy stored (and will be returned by fl=*), while docValues=??? specifies a different mechanism for storage than the default.

          Specifying both stored=true and docValues=??? will currently add the value to both normal stored fields and docValues, optimizing for number of seeks for fl=*, while being sub-optimal as far as index size is concerned (the value will be stored in two places.) Future configuration options should be able to control this trade-off.

          Show
          Yonik Seeley added a comment - stored fields and doc values are considered very different features I think we see things a bit differently at the conceptual level - but it currently amounts to the same concrete decisions (i.e. I agree with all your other list items). stored=true says that a field is conceptualy stored (and will be returned by fl=*), while docValues=??? specifies a different mechanism for storage than the default. Specifying both stored=true and docValues=??? will currently add the value to both normal stored fields and docValues, optimizing for number of seeks for fl=*, while being sub-optimal as far as index size is concerned (the value will be stored in two places.) Future configuration options should be able to control this trade-off.
          Hide
          Adrien Grand added a comment -

          New patch based on the new DV API:

          • Doc values can be enabled on a per-field basis (add docValues="true" to your field definition).
          • I added support for doc values to StrField, UUIDField and all Trie*Field.
          • The doc values type can be configured on the fieldType. For example, if your string fields are unique (or almost unique), you can add docValuesType="binary" instead of "sorted" to the fieldType definition.
          • When a field has doc values, it needs to be single-valued and to be either required or have a default value. So things like sortMissingFirt/Last don't make sense when a field has doc values.
          • By using a SchemaCodecFactory, you can configure the DocValuesFormat you want to use (Lucene42/Disk/SimpleText/CheapBastard/AddYourOwnHere).

          A good news is that thanks to the refactoring of the FieldCache API, faceting will work on fields with doc values out of the box.

          Show
          Adrien Grand added a comment - New patch based on the new DV API: Doc values can be enabled on a per-field basis (add docValues="true" to your field definition). I added support for doc values to StrField, UUIDField and all Trie*Field. The doc values type can be configured on the fieldType. For example, if your string fields are unique (or almost unique), you can add docValuesType="binary" instead of "sorted" to the fieldType definition. When a field has doc values, it needs to be single-valued and to be either required or have a default value. So things like sortMissingFirt/Last don't make sense when a field has doc values. By using a SchemaCodecFactory, you can configure the DocValuesFormat you want to use (Lucene42/Disk/SimpleText/CheapBastard/AddYourOwnHere). A good news is that thanks to the refactoring of the FieldCache API, faceting will work on fields with doc values out of the box.
          Hide
          Robert Muir added a comment -

          I like the size of this patch!

          One thing thats a little confusing is the multivalued check against a boolean 'hasDocValues'.

          In lucene this is not a limitation of docvalues per-say, instead its controlled-per-type. it just happens that all 3 types today have consumers that check for this and throw exception, but we might add a multi-valued type later (LUCENE-4765).

          Separately there is also the possibility that someone can lay down their own multi-valued encoding over e.g. a binary dv (like lucene facets/). But if we have a private check called from SchemaField's ctor and checks in e.g. DocumentBuilder, it seems like this would be difficult to change. Maybe as a start we should just let IndexWriter handle the check here?

          Show
          Robert Muir added a comment - I like the size of this patch! One thing thats a little confusing is the multivalued check against a boolean 'hasDocValues'. In lucene this is not a limitation of docvalues per-say, instead its controlled-per-type. it just happens that all 3 types today have consumers that check for this and throw exception, but we might add a multi-valued type later ( LUCENE-4765 ). Separately there is also the possibility that someone can lay down their own multi-valued encoding over e.g. a binary dv (like lucene facets/). But if we have a private check called from SchemaField's ctor and checks in e.g. DocumentBuilder, it seems like this would be difficult to change. Maybe as a start we should just let IndexWriter handle the check here?
          Hide
          Adrien Grand added a comment - - edited

          Good point Robert. My initial goal was to fail as soon as possible but you're right, we should not prevent multi-valued fields from having doc values.

          Show
          Adrien Grand added a comment - - edited Good point Robert. My initial goal was to fail as soon as possible but you're right, we should not prevent multi-valued fields from having doc values.
          Hide
          Robert Muir added a comment -

          Another thing to keep in mind is enforcement of default value. So if this could be controlled per-type like lucene too, this would be ideal.

          For example multivalued dv type is conceptually a sorted set of ordinals per document. It can be empty (doc has 0 ords).

          Show
          Robert Muir added a comment - Another thing to keep in mind is enforcement of default value. So if this could be controlled per-type like lucene too, this would be ideal. For example multivalued dv type is conceptually a sorted set of ordinals per document. It can be empty (doc has 0 ords).
          Hide
          Adrien Grand added a comment -

          New patch which forces fields to be single-valued / required / have a default value in FieldType.checkSchemaField so that it can be changed on a per-FieldType basis.

          Show
          Adrien Grand added a comment - New patch which forces fields to be single-valued / required / have a default value in FieldType.checkSchemaField so that it can be changed on a per-FieldType basis.
          Hide
          Yonik Seeley added a comment -

          Looking good!

          Regarding the default schema, we try to avoid default values for fields since it slows down indexing and makes the index bigger for those not using them. It's nice if people can just start using the stock schema w/o having to change anything (and people tend to benchmark this way too). Not sure what to do about that... except perhaps to only enable docValues in the stock schema that have zero cost when unused (and none do yet?)

          Show
          Yonik Seeley added a comment - Looking good! Regarding the default schema, we try to avoid default values for fields since it slows down indexing and makes the index bigger for those not using them. It's nice if people can just start using the stock schema w/o having to change anything (and people tend to benchmark this way too). Not sure what to do about that... except perhaps to only enable docValues in the stock schema that have zero cost when unused (and none do yet?)
          Hide
          Robert Muir added a comment -

          Maybe for now, for the example we just have two simple unused string_dv/numeric_dv field types.

          and have two commented-out fields using them with an explanation of what dv is inside the comment?

          Show
          Robert Muir added a comment - Maybe for now, for the example we just have two simple unused string_dv/numeric_dv field types. and have two commented-out fields using them with an explanation of what dv is inside the comment?
          Hide
          Adrien Grand added a comment -

          I modified the example schema to disable doc values on all fields and added a comment to say that it might be a good idea to enable doc values on popularity and manu_exact. Yonik, Robert, what do you think?

          Show
          Adrien Grand added a comment - I modified the example schema to disable doc values on all fields and added a comment to say that it might be a good idea to enable doc values on popularity and manu_exact. Yonik, Robert, what do you think?
          Hide
          Robert Muir added a comment -

          Damn i hate our binary type. If it weren't for facets....

          if (docValuesType != DocValuesType.SORTED && docValuesType != DocValuesType.BINARY) {
            throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
              "StrField only supports binary and sorted doc values");
          }
          
          

          <!-- Use this field type in conjunction with a field with doc values to sort
          efficiently on a field which has a lot of unique terms. -->
          <fieldType name="unique_string_sort" class="solr.StrField" docValuesType="binary" />

          
          

          Can we either:

          1. NOT suggest this and fix the check to only allow sorted values by default.
          2. fix getSortField() and getValueSource() to do the right thing and not call FieldCache.getDocTermsIndex

          Same goes with any other field types. Such insanity should be avoided

          Show
          Robert Muir added a comment - Damn i hate our binary type. If it weren't for facets.... if (docValuesType != DocValuesType.SORTED && docValuesType != DocValuesType.BINARY) { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "StrField only supports binary and sorted doc values" ); } <!-- Use this field type in conjunction with a field with doc values to sort efficiently on a field which has a lot of unique terms. --> <fieldType name="unique_string_sort" class="solr.StrField" docValuesType="binary" /> Can we either: NOT suggest this and fix the check to only allow sorted values by default. fix getSortField() and getValueSource() to do the right thing and not call FieldCache.getDocTermsIndex Same goes with any other field types. Such insanity should be avoided
          Hide
          Robert Muir added a comment -

          We also need to fix the existing checks for sort/valuesource/etc/etc.

          These currently throw an exception if a field is not indexed: but thats unrelated if it has docvalues.

          Show
          Robert Muir added a comment - We also need to fix the existing checks for sort/valuesource/etc/etc. These currently throw an exception if a field is not indexed: but thats unrelated if it has docvalues.
          Hide
          Adrien Grand added a comment -

          Some progress: I removed support for DocValuesType.BINARY and modified faceting and stats to use the numeric field caches when the field has doc values.

          Show
          Adrien Grand added a comment - Some progress: I removed support for DocValuesType.BINARY and modified faceting and stats to use the numeric field caches when the field has doc values.
          Hide
          Adrien Grand added a comment -

          New patch, all tests passed.

          • I made DV integration in the stats component a little cleaner,
          • numeric faceting now works even if facet.mincount=0 but it requires the field to be indexed.

          I think it's ready?

          Show
          Adrien Grand added a comment - New patch, all tests passed. I made DV integration in the stats component a little cleaner, numeric faceting now works even if facet.mincount=0 but it requires the field to be indexed. I think it's ready?
          Hide
          Yonik Seeley added a comment -

          modified faceting and stats to use the numeric field caches when the field has doc values.

          Nice! I notice a bunch of great cleanups that have long been on my TODO list too (esp in stats)!

          Show
          Yonik Seeley added a comment - modified faceting and stats to use the numeric field caches when the field has doc values. Nice! I notice a bunch of great cleanups that have long been on my TODO list too (esp in stats)!
          Hide
          Robert Muir added a comment -

          patch looks good. Awesome!

          Show
          Robert Muir added a comment - patch looks good. Awesome!
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Adrien Grand
          http://svn.apache.org/viewvc?view=revision&revision=1446922

          SOLR-3855: Doc values support.

          Show
          Commit Tag Bot added a comment - [trunk commit] Adrien Grand http://svn.apache.org/viewvc?view=revision&revision=1446922 SOLR-3855 : Doc values support.
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Adrien Grand
          http://svn.apache.org/viewvc?view=revision&revision=1446934

          SOLR-3855: Doc values support (mergd from r1446922).

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Adrien Grand http://svn.apache.org/viewvc?view=revision&revision=1446934 SOLR-3855 : Doc values support (mergd from r1446922).
          Hide
          Adrien Grand added a comment -

          Committed!

          Nice! I notice a bunch of great cleanups that have long been on my TODO list too (esp in stats)!

          Yes. But, there are still a few things to clean up. In particular it would be better if this component supported custom field types and didn't always box numbers. (This would likely require a large refactoring of this component.) Maybe LUCENE-4765 could also help factor more code between the single-valued and multi-valued cases.

          Show
          Adrien Grand added a comment - Committed! Nice! I notice a bunch of great cleanups that have long been on my TODO list too (esp in stats)! Yes. But, there are still a few things to clean up. In particular it would be better if this component supported custom field types and didn't always box numbers. (This would likely require a large refactoring of this component.) Maybe LUCENE-4765 could also help factor more code between the single-valued and multi-valued cases.
          Hide
          Gopal Patwa added a comment -

          Is there an example or test case to update DocValues field without updating index or reopening index searcher? is this even possible?

          Show
          Gopal Patwa added a comment - Is there an example or test case to update DocValues field without updating index or reopening index searcher? is this even possible?
          Hide
          Adrien Grand added a comment -

          Unfortunately doc values are not updateable.

          Show
          Adrien Grand added a comment - Unfortunately doc values are not updateable.
          Hide
          Shawn Heisey added a comment -

          The branch_4x commit has a compiler error in TrieField. It attempts to override the longToString method in LongFieldSource, but that method isn't there in branch_4x. If I copy the method over from trunk in LongFieldSource, it seems to fix the compiler error.

          Show
          Shawn Heisey added a comment - The branch_4x commit has a compiler error in TrieField. It attempts to override the longToString method in LongFieldSource, but that method isn't there in branch_4x. If I copy the method over from trunk in LongFieldSource, it seems to fix the compiler error.
          Hide
          Adrien Grand added a comment -

          Thanks Shawn, I committed from the wrong directory! This should be OK now.

          Show
          Adrien Grand added a comment - Thanks Shawn, I committed from the wrong directory! This should be OK now.
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Adrien Grand
          http://svn.apache.org/viewvc?view=revision&revision=1446951

          SOLR-3855: Fix compilation.

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Adrien Grand http://svn.apache.org/viewvc?view=revision&revision=1446951 SOLR-3855 : Fix compilation.
          Hide
          Mark Miller added a comment -

          This is great, thanks Adrien!

          Show
          Mark Miller added a comment - This is great, thanks Adrien!
          Hide
          Adrien Grand added a comment -

          Reopening: I just found that faceting is broken on single-valued trie fields that have a precision step < Integer.MAX_VALUE when facet.mincount is 0 (it adds counts for all terms instead of filtering the "main" ones).

          Show
          Adrien Grand added a comment - Reopening: I just found that faceting is broken on single-valued trie fields that have a precision step < Integer.MAX_VALUE when facet.mincount is 0 (it adds counts for all terms instead of filtering the "main" ones).
          Hide
          Adrien Grand added a comment -

          Patch that fixes the problem.

          I also modified StatsComponent and FacetComponent to use the numeric field cache instead of UnInvertedField on single-valued trie fields (Solr used to use UnInvertedField when precisionStep is < Integer.MAX_VALUE).

          Show
          Adrien Grand added a comment - Patch that fixes the problem. I also modified StatsComponent and FacetComponent to use the numeric field cache instead of UnInvertedField on single-valued trie fields (Solr used to use UnInvertedField when precisionStep is < Integer.MAX_VALUE).
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Adrien Grand
          http://svn.apache.org/viewvc?view=revision&revision=1449360

          SOLR-3855: Fix faceting on numeric fields with precisionStep < Integer.MAX_VALUE, facet.mincount=0 and facet.method=fcs.

          Show
          Commit Tag Bot added a comment - [trunk commit] Adrien Grand http://svn.apache.org/viewvc?view=revision&revision=1449360 SOLR-3855 : Fix faceting on numeric fields with precisionStep < Integer.MAX_VALUE, facet.mincount=0 and facet.method=fcs.
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Adrien Grand
          http://svn.apache.org/viewvc?view=revision&revision=1449365

          SOLR-3855: Fix faceting on numeric fields with precisionStep < Integer.MAX_VALUE, facet.mincount=0 and facet.method=fcs.

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Adrien Grand http://svn.apache.org/viewvc?view=revision&revision=1449365 SOLR-3855 : Fix faceting on numeric fields with precisionStep < Integer.MAX_VALUE, facet.mincount=0 and facet.method=fcs.
          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.

            People

            • Assignee:
              Adrien Grand
              Reporter:
              Adrien Grand
            • Votes:
              4 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development