Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8831

allow _version_ field to be retrievable via docValues

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Right now, one is prohibited from having an unstored version field, even if docValues are enabled.

      1. SOLR-8831.patch
        0.8 kB
        Yonik Seeley

        Activity

        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        simple patch attached.

        Show
        yseeley@gmail.com Yonik Seeley added a comment - simple patch attached.
        Hide
        dsmiley David Smiley added a comment -

        +1

        Show
        dsmiley David Smiley added a comment - +1
        Hide
        varunthacker Varun Thacker added a comment -

        Hi Yonik,

        Was this committed as part in SOLR-5670 ?

        Show
        varunthacker Varun Thacker added a comment - Hi Yonik, Was this committed as part in SOLR-5670 ?
        Hide
        ichattopadhyaya Ishan Chattopadhyaya added a comment -
        Show
        ichattopadhyaya Ishan Chattopadhyaya added a comment - +1, this is much needed. I used to keep something like this in my patch for SOLR-5944 ( https://github.com/chatman/lucene-solr/commit/627b9ac9b46796f20be78b04ebbdfa4299b96ab7#diff-040ec312b12294ee52a94eac00766ea7L77 ).
        Hide
        jkrupan Jack Krupansky added a comment - - edited

        Can we come up with a nice clean term for "stored or docValues are enabled"?

        I mean, the issue title here is misleading, as the description then indicates - "if docValues are enabled." So, it should be "allow version field to be unstored if docValues are enabled."

        Traditional database nomenclature is no help here since the concept of non-stored data is meaningless in a true database.

        Personally, I'd be happier if Solr hid a lot of the byzantine complexity of Lucene, including this odd distinction between stored and docValues. I mean, to me they are just two different implementations of the logical concept of storing data for later retreival - how the data is stored rather than whether it is stored.

        I'll offer two suggested simple terms to be used at the Solr level even if Lucene insists on remaining byzantine: "xstored" or "retrievable", both meaning that the field attributes make it possible for Solr to retrieve data after indexing, either because the field is stored or has docValues enabled. This is not a proposal for a feature, but simply terminology to be used to talk about fields which are... "either stored or have docValues enabled." (If I wanted a feature, it might be to have a new attribute like retrieval_storage="{by_field|by_document|none}" or... stored="{yes|no|docValues|fieldValues}".)

        I'm not proposing any feature here since that would be out of the scope of the issue, but since this issue needs doc, I am just proposing new terminology for that doc.

        Again, to summarize more briefly, I am proposed that the terminology of "retrievable" be used to refer to fields that are either stored or have docValues enabled.

        Show
        jkrupan Jack Krupansky added a comment - - edited Can we come up with a nice clean term for "stored or docValues are enabled"? I mean, the issue title here is misleading, as the description then indicates - "if docValues are enabled." So, it should be "allow version field to be unstored if docValues are enabled." Traditional database nomenclature is no help here since the concept of non-stored data is meaningless in a true database. Personally, I'd be happier if Solr hid a lot of the byzantine complexity of Lucene, including this odd distinction between stored and docValues. I mean, to me they are just two different implementations of the logical concept of storing data for later retreival - how the data is stored rather than whether it is stored. I'll offer two suggested simple terms to be used at the Solr level even if Lucene insists on remaining byzantine: "xstored" or "retrievable", both meaning that the field attributes make it possible for Solr to retrieve data after indexing, either because the field is stored or has docValues enabled. This is not a proposal for a feature, but simply terminology to be used to talk about fields which are... "either stored or have docValues enabled." (If I wanted a feature, it might be to have a new attribute like retrieval_storage="{by_field|by_document|none}" or... stored="{yes|no|docValues|fieldValues}".) I'm not proposing any feature here since that would be out of the scope of the issue, but since this issue needs doc, I am just proposing new terminology for that doc. Again, to summarize more briefly, I am proposed that the terminology of "retrievable" be used to refer to fields that are either stored or have docValues enabled.
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        they are just two different implementations of the logical concept of storing data for later retreival

        I agree - I've been occasionally using the term "row stored" and "column stored".
        While we won't be able to totally squash the terms "stored" or "docValues" (too much history), in certain contexts it will certainly be easier to use an all encompassing term like "retrievable". I'll update this patch to reflect that unless someone comes up with a better word for it.

        Show
        yseeley@gmail.com Yonik Seeley added a comment - they are just two different implementations of the logical concept of storing data for later retreival I agree - I've been occasionally using the term "row stored" and "column stored". While we won't be able to totally squash the terms "stored" or "docValues" (too much history), in certain contexts it will certainly be easier to use an all encompassing term like "retrievable". I'll update this patch to reflect that unless someone comes up with a better word for it.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit ff8cedcb11638ee52f91bf81bad2ee01f3c3d59a in lucene-solr's branch refs/heads/branch_6_0 from Yonik Seeley
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ff8cedc ]

        SOLR-8831: allow version field to be retrievable via docValues

        Show
        jira-bot ASF subversion and git services added a comment - Commit ff8cedcb11638ee52f91bf81bad2ee01f3c3d59a in lucene-solr's branch refs/heads/branch_6_0 from Yonik Seeley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ff8cedc ] SOLR-8831 : allow version field to be retrievable via docValues
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        I think that was for searchability purposes... it allowed indexed OR docValues (and said nothing about stored)

        Show
        yseeley@gmail.com Yonik Seeley added a comment - I think that was for searchability purposes... it allowed indexed OR docValues (and said nothing about stored)
        Hide
        jkrupan Jack Krupansky added a comment -

        Now that docValues is supported for version, the question arises as to which is preferred (faster, less memory), stored or docValues. IOW, which should be the default. I presume it should be docValues, but I have no real clue.

        Also, the doc for Atomic Update has this example as a Power Tip, that has BOTH stored and docValues set:

        <field name="_version_" type="long" indexed="false" stored="true" required="true" docValues="true"/>
        

        See:
        https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

        Should that be changed to stored="false"? Or, is there actually some aditional hidden benefit to store="true" AND docValues="true"?

        Show
        jkrupan Jack Krupansky added a comment - Now that docValues is supported for version , the question arises as to which is preferred (faster, less memory), stored or docValues. IOW, which should be the default. I presume it should be docValues, but I have no real clue. Also, the doc for Atomic Update has this example as a Power Tip, that has BOTH stored and docValues set: <field name= "_version_" type= " long " indexed= " false " stored= " true " required= " true " docValues= " true " /> See: https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents Should that be changed to stored="false"? Or, is there actually some aditional hidden benefit to store="true" AND docValues="true"?

          People

          • Assignee:
            yseeley@gmail.com Yonik Seeley
            Reporter:
            yseeley@gmail.com Yonik Seeley
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development