Lucene - Core
  1. Lucene - Core
  2. LUCENE-5119

DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.5, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      These are accessed sequentially when e.g. faceting, and can be a fairly large amount of data (based on # of docs and # of unique terms).

      I think this was done so that conceptually "random" access to a specific docid would be faster than eg. stored fields, but I think we should instead target the DV datastructures towards real use cases (faceting,sorting,grouping,...)

        Activity

        Hide
        David Smiley added a comment -

        Would it be easy to add random access as an option? Looking at your patch, which was pretty simple, it doesn't appear that it'd be hard to support random access should an application which to want this.

        A realistic example in my mind is a spatial filter in which a potentially large binary geometry representations of a shape is encoded for each document into DiskDV. Some fast leading filters narrow down the applicable documents but some documents shape geometry need to be consulted in the DiskDV afterwards. Does that make sense?

        Show
        David Smiley added a comment - Would it be easy to add random access as an option? Looking at your patch, which was pretty simple, it doesn't appear that it'd be hard to support random access should an application which to want this. A realistic example in my mind is a spatial filter in which a potentially large binary geometry representations of a shape is encoded for each document into DiskDV. Some fast leading filters narrow down the applicable documents but some documents shape geometry need to be consulted in the DiskDV afterwards. Does that make sense?
        Hide
        Robert Muir added a comment -

        I dont plan to do this. Thats why we have a codec api...

        Show
        Robert Muir added a comment - I dont plan to do this. Thats why we have a codec api...
        Hide
        Adrien Grand added a comment -

        +1 I think it makes sense to make DiskDV deserve its name and store everything on disk.

        Show
        Adrien Grand added a comment - +1 I think it makes sense to make DiskDV deserve its name and store everything on disk.
        Hide
        Adrien Grand added a comment -

        David, I think your use-case would still work pretty well with this change. In particular, if you had enough memory to store your ordinals mapping in memory, this means that the file-system cache will likely be able to cache the whole ordinals mapping as well (you may just need to decrease a little the amount of memory given the the JVM) so random access should remain fast?

        Show
        Adrien Grand added a comment - David, I think your use-case would still work pretty well with this change. In particular, if you had enough memory to store your ordinals mapping in memory, this means that the file-system cache will likely be able to cache the whole ordinals mapping as well (you may just need to decrease a little the amount of memory given the the JVM) so random access should remain fast?
        Hide
        Michael McCandless added a comment -

        +1 to move ords to disk.

        Show
        Michael McCandless added a comment - +1 to move ords to disk.
        Hide
        ASF subversion and git services added a comment -

        Commit 1504868 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1504868 ]

        LUCENE-5119: DiskDV SortedDocValues shouldnt hold doc-to-ord in heap

        Show
        ASF subversion and git services added a comment - Commit 1504868 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1504868 ] LUCENE-5119 : DiskDV SortedDocValues shouldnt hold doc-to-ord in heap
        Hide
        ASF subversion and git services added a comment -

        Commit 1504873 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1504873 ]

        LUCENE-5119: DiskDV SortedDocValues shouldnt hold doc-to-ord in heap

        Show
        ASF subversion and git services added a comment - Commit 1504873 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1504873 ] LUCENE-5119 : DiskDV SortedDocValues shouldnt hold doc-to-ord in heap
        Hide
        Adrien Grand added a comment -

        4.5 release -> bulk close

        Show
        Adrien Grand added a comment - 4.5 release -> bulk close

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development