Lucene - Core
  1. Lucene - Core
  2. LUCENE-5063

Allow GrowableWriter to store negative values

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.4
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      For some use-cases, it would be convenient to be able to store negative values in a GrowableWriter, for example to use it in FieldCache: The first term is the minimum value and one could use a GrowableWriter to store deltas between this minimum value and the current value. (The need for negative values comes from the fact that maxValue - minValue might be larger than Long.MAX_VALUE.)

      1. LUCENE-5063.patch
        28 kB
        Adrien Grand
      2. LUCENE-5063.patch
        17 kB
        Adrien Grand

        Activity

        Hide
        Adrien Grand added a comment -

        Here is a patch which makes GrowableWriter able to store negative values and makes FieldCache.DEFAULT.get(Ints|Longs) use it. In order to not make field cache loading too slow, the GrowableWriters are created with an acceptable overhead ratio of 50% so that they can grow the number of bits per value quickly in order not to perform too much resizing.

        Show
        Adrien Grand added a comment - Here is a patch which makes GrowableWriter able to store negative values and makes FieldCache.DEFAULT.get(Ints|Longs) use it. In order to not make field cache loading too slow, the GrowableWriters are created with an acceptable overhead ratio of 50% so that they can grow the number of bits per value quickly in order not to perform too much resizing.
        Hide
        Robert Muir added a comment -

        On one hand we pay the price of an add:

             @Override
             public long get(int docID) {
        -      return values[docID];
        +      return minValue + values.get(docID);
             }
        

        But we get no benefit...

        + * <p>Beware that this class will accept to set negative values but in order
        + * to do this, it will grow the number of bits per value to 64.
        

        This doesn't seem right...

        Show
        Robert Muir added a comment - On one hand we pay the price of an add: @Override public long get( int docID) { - return values[docID]; + return minValue + values.get(docID); } But we get no benefit... + * <p>Beware that this class will accept to set negative values but in order + * to do this , it will grow the number of bits per value to 64. This doesn't seem right...
        Hide
        Robert Muir added a comment -

        i see, so we only need negatives in growablewriter for the case where we'd use 64 bpv for longs anyway.
        Can we add a comment?

        Also, we start at 4bpv here, but we don't bitpack for byte/short too? it could be a little unintuitive that using long takes less ram than byte

        Or, maybe FC should only have a 'long' API to better match DV?

        In order to not make field cache loading too slow, the GrowableWriters are created with an acceptable overhead ratio of 50% so that they can grow the number of bits per value quickly in order not to perform too much resizing.

        This is consistent with SortedDocValuesImpl, except SortedDocValuesImpl has a 'startBPV' of 1, whereas its 4 here. Maybe we should use 1 here too?

        Show
        Robert Muir added a comment - i see, so we only need negatives in growablewriter for the case where we'd use 64 bpv for longs anyway. Can we add a comment? Also, we start at 4bpv here, but we don't bitpack for byte/short too? it could be a little unintuitive that using long takes less ram than byte Or, maybe FC should only have a 'long' API to better match DV? In order to not make field cache loading too slow, the GrowableWriters are created with an acceptable overhead ratio of 50% so that they can grow the number of bits per value quickly in order not to perform too much resizing. This is consistent with SortedDocValuesImpl, except SortedDocValuesImpl has a 'startBPV' of 1, whereas its 4 here. Maybe we should use 1 here too?
        Hide
        Adrien Grand added a comment -

        i see, so we only need negatives in growablewriter for the case where we'd use 64 bpv for longs anyway.

        Exactly. Negative values in a GrowableWriter are more 64-bits unsigned values than actual negative values.

        Or, maybe FC should only have a 'long' API to better match DV?

        Are you talking about removing all get(Bytes|Shorts|Ints|Floats|Doubles) and only have getLongs which would return a NumericDocValues instance? Indeed I think it would make things simpler and more consistent (eg. comparators and FieldCacheRangeFilter) but this looks like a big change!

        This is consistent with SortedDocValuesImpl, except SortedDocValuesImpl has a 'startBPV' of 1, whereas its 4 here. Maybe we should use 1 here too?

        Agreed.

        Show
        Adrien Grand added a comment - i see, so we only need negatives in growablewriter for the case where we'd use 64 bpv for longs anyway. Exactly. Negative values in a GrowableWriter are more 64-bits unsigned values than actual negative values. Or, maybe FC should only have a 'long' API to better match DV? Are you talking about removing all get(Bytes|Shorts|Ints|Floats|Doubles) and only have getLongs which would return a NumericDocValues instance? Indeed I think it would make things simpler and more consistent (eg. comparators and FieldCacheRangeFilter) but this looks like a big change! This is consistent with SortedDocValuesImpl, except SortedDocValuesImpl has a 'startBPV' of 1, whereas its 4 here. Maybe we should use 1 here too? Agreed.
        Hide
        Robert Muir added a comment -

        Indeed I think it would make things simpler and more consistent (eg. comparators and FieldCacheRangeFilter) but this looks like a big change!

        It doesnt need to hold up this issue. we can make a followup issue for that. Maybe we should do something about the Bytes/Shorts though here...

        Show
        Robert Muir added a comment - Indeed I think it would make things simpler and more consistent (eg. comparators and FieldCacheRangeFilter) but this looks like a big change! It doesnt need to hold up this issue. we can make a followup issue for that. Maybe we should do something about the Bytes/Shorts though here...
        Hide
        Adrien Grand added a comment -

        Maybe we should do something about the Bytes/Shorts though here...

        Given that we don't even have numeric support (they are just encoded/decoded as strings) for these types, maybe we should just remove or deprecate them?

        Show
        Adrien Grand added a comment - Maybe we should do something about the Bytes/Shorts though here... Given that we don't even have numeric support (they are just encoded/decoded as strings) for these types, maybe we should just remove or deprecate them?
        Hide
        Michael McCandless added a comment -

        Maybe we should do something about the Bytes/Shorts though here...

        Given that we don't even have numeric support (they are just encoded/decoded as strings) for these types, maybe we should just remove or deprecate them?

        +1

        Show
        Michael McCandless added a comment - Maybe we should do something about the Bytes/Shorts though here... Given that we don't even have numeric support (they are just encoded/decoded as strings) for these types, maybe we should just remove or deprecate them? +1
        Hide
        Michael McCandless added a comment -

        +1, patch looks good!

        Show
        Michael McCandless added a comment - +1, patch looks good!
        Hide
        Adrien Grand added a comment -

        Same patch with added deprecation warnings:

        • FieldCache.get(Byte|Short)s
        • FieldCache.DEFAULT_*_PARSER (because they assume numeric data is encoded as strings)
        • SortField.Type.(Byte|Short)
        • (Byte|Short)FieldSource
        • Solr's ByteField and ShortField
        Show
        Adrien Grand added a comment - Same patch with added deprecation warnings: FieldCache.get(Byte|Short)s FieldCache.DEFAULT_*_PARSER (because they assume numeric data is encoded as strings) SortField.Type.(Byte|Short) (Byte|Short)FieldSource Solr's ByteField and ShortField
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] jpountz
        http://svn.apache.org/viewvc?view=revision&revision=1495146

        LUCENE-5063: ...continuation.

        Show
        Commit Tag Bot added a comment - [branch_4x commit] jpountz http://svn.apache.org/viewvc?view=revision&revision=1495146 LUCENE-5063 : ...continuation.
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] jpountz
        http://svn.apache.org/viewvc?view=revision&revision=1495156

        LUCENE-5063: Compress integer and long field caches and remove FieldCache.get(Byte|Short)s, default parsers and related class/methods (merged from r1494753 and r1495146).

        Show
        Commit Tag Bot added a comment - [trunk commit] jpountz http://svn.apache.org/viewvc?view=revision&revision=1495156 LUCENE-5063 : Compress integer and long field caches and remove FieldCache.get(Byte|Short)s, default parsers and related class/methods (merged from r1494753 and r1495146).
        Hide
        Steve Rowe added a comment -

        Bulk close resolved 4.4 issues

        Show
        Steve Rowe added a comment - Bulk close resolved 4.4 issues

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development