Lucene - Core
  1. Lucene - Core
  2. LUCENE-3509

Add settings to IWC to optimize IDV indices for CPU or RAM respectivly

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      spinnoff from LUCENE-3496 - we are seeing much better performance if required bits for PackedInts are rounded up to a 8/16/32/64. We should add this option to IWC and default to round up ie. more RAM & faster lookups.

      1. LUCENE-3509.patch
        11 kB
        Martijn van Groningen
      2. LUCENE-3509.patch
        12 kB
        Martijn van Groningen
      3. LUCENE-3509.patch
        26 kB
        Martijn van Groningen

        Issue Links

          Activity

          Hide
          Martijn van Groningen added a comment -

          Committed to trunk.

          Show
          Martijn van Groningen added a comment - Committed to trunk.
          Hide
          Martijn van Groningen added a comment -

          Updated the patch to the trunk. I think it is ready. For now this is only an optimization for doc values of type sorted bytes. I will commit this in the coming days if everyone is fine with it.

          Show
          Martijn van Groningen added a comment - Updated the patch to the trunk. I think it is ready. For now this is only an optimization for doc values of type sorted bytes. I will commit this in the coming days if everyone is fine with it.
          Hide
          Simon Willnauer added a comment -

          I think fasterButMoreRam is fine, since it is a dv codec parameter now.

          +1 go ahead

          Show
          Simon Willnauer added a comment - I think fasterButMoreRam is fine, since it is a dv codec parameter now. +1 go ahead
          Hide
          Martijn van Groningen added a comment -

          I think fasterButMoreRam is fine, since it is a dv codec parameter now.

          Show
          Martijn van Groningen added a comment - I think fasterButMoreRam is fine, since it is a dv codec parameter now.
          Hide
          Simon Willnauer added a comment -

          maybe just call it optimizeForSpeed or fasterButMoreRam or alignPackedInts?

          Show
          Simon Willnauer added a comment - maybe just call it optimizeForSpeed or fasterButMoreRam or alignPackedInts?
          Hide
          Michael McCandless added a comment -

          Maybe just packedIntsFasterButMoreRAM?

          Show
          Michael McCandless added a comment - Maybe just packedIntsFasterButMoreRAM?
          Hide
          Martijn van Groningen added a comment -

          We currently only use it for types BYTES_VAR_SORTED and BYTES_FIXED_SORTED hence the name. I think we can use it for the other types as well. This option basically decides whether DirectInt or PacketInt is used (in PacketInt#getReader()). The DirectInt classes are faster for when using BYTES_VAR_SORTED and BYTES_FIXED_SORTED types. I'll try and test if this is also the case for for the other types (I think it will be faster for the other types as well).

          Show
          Martijn van Groningen added a comment - We currently only use it for types BYTES_VAR_SORTED and BYTES_FIXED_SORTED hence the name. I think we can use it for the other types as well. This option basically decides whether DirectInt or PacketInt is used (in PacketInt#getReader()). The DirectInt classes are faster for when using BYTES_VAR_SORTED and BYTES_FIXED_SORTED types. I'll try and test if this is also the case for for the other types (I think it will be faster for the other types as well).
          Hide
          Simon Willnauer added a comment -

          Updated the patch. The option with name "sortedBytesFasterButMoreRam" is now in DocValuesWriterBase class and defaults to true.

          it might make sense to give it a more general name since we use packed ints in various places? The question is if we want to use this option elsewhere... not sure though

          Show
          Simon Willnauer added a comment - Updated the patch. The option with name "sortedBytesFasterButMoreRam" is now in DocValuesWriterBase class and defaults to true. it might make sense to give it a more general name since we use packed ints in various places? The question is if we want to use this option elsewhere... not sure though
          Hide
          Martijn van Groningen added a comment -

          Updated the patch. The option with name "sortedBytesFasterButMoreRam" is now in DocValuesWriterBase class and defaults to true.

          Show
          Martijn van Groningen added a comment - Updated the patch. The option with name "sortedBytesFasterButMoreRam" is now in DocValuesWriterBase class and defaults to true.
          Hide
          Martijn van Groningen added a comment -

          I also prefer to have a default that matches with the FieldCache. I will change the patch so that the option is at the codec impl level (DefaultDocValuesConsumer).

          Show
          Martijn van Groningen added a comment - I also prefer to have a default that matches with the FieldCache. I will change the patch so that the option is at the codec impl level (DefaultDocValuesConsumer).
          Hide
          Michael McCandless added a comment -

          I think enabling at the codec impl level makes sense.

          But I'd prefer to have the defaulting match what we do for FieldCache, ie default to "fasterButMoreRAM".

          Show
          Michael McCandless added a comment - I think enabling at the codec impl level makes sense. But I'd prefer to have the defaulting match what we do for FieldCache, ie default to "fasterButMoreRAM".
          Hide
          Simon Willnauer added a comment -

          We should expose this via low level DocValues implementation and maybe not via IWC. I think a consistent way would be enableing this in MemoryCodec and use the more ram efficient variant by default. This is just like CFS which is disabled in SepCodec.

          Show
          Simon Willnauer added a comment - We should expose this via low level DocValues implementation and maybe not via IWC. I think a consistent way would be enableing this in MemoryCodec and use the more ram efficient variant by default. This is just like CFS which is disabled in SepCodec.
          Hide
          Martijn van Groningen added a comment -

          So I think it would be best if this was only a parameter to this particular implementation... otherwise we end out with a ton of options across all possible implementations, which won't make a lot of sense at all.

          Makes sense if you write it down like that

          I think it should be an option inside DefaultDocValuesConsumer (since this creates the writer) that defaults to true and if users want to change this behaviour they will need to change their codec.

          Show
          Martijn van Groningen added a comment - So I think it would be best if this was only a parameter to this particular implementation... otherwise we end out with a ton of options across all possible implementations, which won't make a lot of sense at all. Makes sense if you write it down like that I think it should be an option inside DefaultDocValuesConsumer (since this creates the writer) that defaults to true and if users want to change this behaviour they will need to change their codec.
          Hide
          Robert Muir added a comment -

          well the same applies to having an IndexDocValuesConfig.

          If my implementation doesn't use packed integers, it doesnt make a lot of sense?

          So I think it would be best if this was only a parameter to this particular implementation... otherwise we end out with a ton of options across all possible implementations, which won't make a lot of sense at all.

          I think at the end of the day, we should imagine having tons of implementations for the different index parts (look how many term dictionaries we have), and let impls have their own parameters.

          If someone wants to toggle these crazy parameters, thats something they can do in their codec definition. For solr users etc, thats something they can tweak in the xml on a per-implementation basis just like when they provide crazy parameters to various tokenfilters in their analysis chain.

          Show
          Robert Muir added a comment - well the same applies to having an IndexDocValuesConfig. If my implementation doesn't use packed integers, it doesnt make a lot of sense? So I think it would be best if this was only a parameter to this particular implementation... otherwise we end out with a ton of options across all possible implementations, which won't make a lot of sense at all. I think at the end of the day, we should imagine having tons of implementations for the different index parts (look how many term dictionaries we have), and let impls have their own parameters. If someone wants to toggle these crazy parameters, thats something they can do in their codec definition. For solr users etc, thats something they can tweak in the xml on a per-implementation basis just like when they provide crazy parameters to various tokenfilters in their analysis chain.
          Hide
          Martijn van Groningen added a comment -

          Maybe this makes sense IWC is high level. Maybe this should be just an option somewhere in IndexDocValues with the name "fasterButMoreRAM". If this option is set to true an implementation will choose performance over RAM. In the case that an implementation uses SimpleText this option wouldn't have an influence, but for packed integers it would have an influence.

          Maybe we should have IndexDocValuesConfig class for DV based settings. I can imagine that we will have more DV based settings in the future...

          Show
          Martijn van Groningen added a comment - Maybe this makes sense IWC is high level. Maybe this should be just an option somewhere in IndexDocValues with the name "fasterButMoreRAM". If this option is set to true an implementation will choose performance over RAM. In the case that an implementation uses SimpleText this option wouldn't have an influence, but for packed integers it would have an influence. Maybe we should have IndexDocValuesConfig class for DV based settings. I can imagine that we will have more DV based settings in the future...
          Hide
          Robert Muir added a comment -

          well one thing that confuses me about the long name is, should it be in IndexWriterConfig at all?

          The reason I say this, it seems to be very specific to a certain implementation of DV.

          For example, this (expert) option doesnt make much sense if I make my own implementation of sortedbytes (maybe for SimpleText ), that doesnt use packed integers.

          Show
          Robert Muir added a comment - well one thing that confuses me about the long name is, should it be in IndexWriterConfig at all? The reason I say this, it seems to be very specific to a certain implementation of DV. For example, this (expert) option doesnt make much sense if I make my own implementation of sortedbytes (maybe for SimpleText ), that doesnt use packed integers.
          Hide
          Martijn van Groningen added a comment -

          For naming consistency that would be better, but this option is only for DV of type sorted bytes (BYTES_VAR_SORTED and BYTES_FIXED_SORTED).
          So maybe something like "sortedBytesDocValuesFasterButMoreRam". I know it is a long name but it describes when the option is applicable. Or maybe just "docValuesFasterButMoreRam" but then the jdoc should clearly state that is only applicable for sorted bytes DV.

          Show
          Martijn van Groningen added a comment - For naming consistency that would be better, but this option is only for DV of type sorted bytes (BYTES_VAR_SORTED and BYTES_FIXED_SORTED). So maybe something like "sortedBytesDocValuesFasterButMoreRam". I know it is a long name but it describes when the option is applicable. Or maybe just "docValuesFasterButMoreRam" but then the jdoc should clearly state that is only applicable for sorted bytes DV.
          Hide
          Robert Muir added a comment -

          I think it would be good to make the naming consistent with "fasterButMoreRAM" in fieldcache?

          Show
          Robert Muir added a comment - I think it would be good to make the naming consistent with "fasterButMoreRAM" in fieldcache?
          Hide
          Martijn van Groningen added a comment -

          Attached initial patch. IWC has the following added methods:

          IWC.setOptimizeBytesDocValuesForSpeed(...)
          IWC.isOptimizeBytesDocValuesForSpeed()
          
          Show
          Martijn van Groningen added a comment - Attached initial patch. IWC has the following added methods: IWC.setOptimizeBytesDocValuesForSpeed(...) IWC.isOptimizeBytesDocValuesForSpeed()

            People

            • Assignee:
              Martijn van Groningen
              Reporter:
              Simon Willnauer
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development