Lucene - Core
  1. Lucene - Core
  2. LUCENE-3862

DocValues getInt() returns long, getFloat() returns double

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I think this is a bit confusing: especially for the case of something like norms
      where its really an 8 bit byte, a long is confusing.

      i think we should have the usual getFloat/getDouble/getInt/getShort/getByte instead?

        Issue Links

          Activity

          Hide
          Ryan McKinley added a comment -

          In looking at replacing FieldCache with DocValues, I came to this same question.

          I also wonder how/if there should be any relationship with the FunctionValues.java class.

          Show
          Ryan McKinley added a comment - In looking at replacing FieldCache with DocValues, I came to this same question. I also wonder how/if there should be any relationship with the FunctionValues.java class.
          Hide
          Uwe Schindler added a comment -

          I already oprened an issue about that!

          Show
          Uwe Schindler added a comment - I already oprened an issue about that!
          Hide
          Simon Willnauer added a comment -

          I should have been more specific. The reason why I did it that way is that you always have to explicitly downcast if you are sure you are not loosing any precision. If you offer a getShort somebody could accidentally downcast its 64 bit values into a 16 bit values without realizing. However, if it allows us to make similarities generic we need to rething that maybe.

          I kind of like the semantics we have right now ie. you get full 64 bit values no matter what you have encoded.

          Show
          Simon Willnauer added a comment - I should have been more specific. The reason why I did it that way is that you always have to explicitly downcast if you are sure you are not loosing any precision. If you offer a getShort somebody could accidentally downcast its 64 bit values into a 16 bit values without realizing. However, if it allows us to make similarities generic we need to rething that maybe. I kind of like the semantics we have right now ie. you get full 64 bit values no matter what you have encoded.
          Hide
          Robert Muir added a comment -

          Let me try to explain my line of reasoning... i dont know if it will work but...

          Its been mentioned before on previous issues that it would be nice if people could have norms impls that arent a huge byte[] or whatever.
          Currently, all of our provided Similarities will not work if hasArray() is false. So if you want to have an alternative norms data structure,
          perhaps some space tradeoff, or based on something you know about your data, it currently requires you to write a custom Similarity too

          I've been curious to test: if the norms impl is really just a byte[], would scoring via the docvalues apis (rather than hasArray) really
          slow things down?

          Because if we just had a getByte(int doc), I think its feasible it would cost nothing over getArray() and byte[doc]... Then people could
          make alternative implementations without also making custom Similarities.

          But i'm nervous about all the casting of bytes to longs and such, I also feel the api is a little confusing...

          With those methods in place we just gonna cast around which should be done by the user of the API.

          But how are we not casting to long now (e.g. single-byte norms case) ?

          Show
          Robert Muir added a comment - Let me try to explain my line of reasoning... i dont know if it will work but... Its been mentioned before on previous issues that it would be nice if people could have norms impls that arent a huge byte[] or whatever. Currently, all of our provided Similarities will not work if hasArray() is false. So if you want to have an alternative norms data structure, perhaps some space tradeoff, or based on something you know about your data, it currently requires you to write a custom Similarity too I've been curious to test: if the norms impl is really just a byte[], would scoring via the docvalues apis (rather than hasArray) really slow things down? Because if we just had a getByte(int doc), I think its feasible it would cost nothing over getArray() and byte [doc] ... Then people could make alternative implementations without also making custom Similarities. But i'm nervous about all the casting of bytes to longs and such, I also feel the api is a little confusing... With those methods in place we just gonna cast around which should be done by the user of the API. But how are we not casting to long now (e.g. single-byte norms case) ?
          Hide
          Simon Willnauer added a comment -

          actually I really don't want this! this increases the complexity. The interface says it can provide Bytes, Integers and Floats independent of their internal encoding or max size. With those methods in place we just gonna cast around which should be done by the user of the API.

          Show
          Simon Willnauer added a comment - actually I really don't want this! this increases the complexity. The interface says it can provide Bytes, Integers and Floats independent of their internal encoding or max size. With those methods in place we just gonna cast around which should be done by the user of the API.
          Hide
          Michael McCandless added a comment -

          +1

          Show
          Michael McCandless added a comment - +1

            People

            • Assignee:
              Unassigned
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:

                Development