Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5119

DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.5, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      These are accessed sequentially when e.g. faceting, and can be a fairly large amount of data (based on # of docs and # of unique terms).

      I think this was done so that conceptually "random" access to a specific docid would be faster than eg. stored fields, but I think we should instead target the DV datastructures towards real use cases (faceting,sorting,grouping,...)

        Activity

        Hide
        dsmiley David Smiley added a comment -

        Would it be easy to add random access as an option? Looking at your patch, which was pretty simple, it doesn't appear that it'd be hard to support random access should an application which to want this.

        A realistic example in my mind is a spatial filter in which a potentially large binary geometry representations of a shape is encoded for each document into DiskDV. Some fast leading filters narrow down the applicable documents but some documents shape geometry need to be consulted in the DiskDV afterwards. Does that make sense?

        Show
        dsmiley David Smiley added a comment - Would it be easy to add random access as an option? Looking at your patch, which was pretty simple, it doesn't appear that it'd be hard to support random access should an application which to want this. A realistic example in my mind is a spatial filter in which a potentially large binary geometry representations of a shape is encoded for each document into DiskDV. Some fast leading filters narrow down the applicable documents but some documents shape geometry need to be consulted in the DiskDV afterwards. Does that make sense?
        Hide
        rcmuir Robert Muir added a comment -

        I dont plan to do this. Thats why we have a codec api...

        Show
        rcmuir Robert Muir added a comment - I dont plan to do this. Thats why we have a codec api...
        Hide
        jpountz Adrien Grand added a comment -

        +1 I think it makes sense to make DiskDV deserve its name and store everything on disk.

        Show
        jpountz Adrien Grand added a comment - +1 I think it makes sense to make DiskDV deserve its name and store everything on disk.
        Hide
        jpountz Adrien Grand added a comment -

        David, I think your use-case would still work pretty well with this change. In particular, if you had enough memory to store your ordinals mapping in memory, this means that the file-system cache will likely be able to cache the whole ordinals mapping as well (you may just need to decrease a little the amount of memory given the the JVM) so random access should remain fast?

        Show
        jpountz Adrien Grand added a comment - David, I think your use-case would still work pretty well with this change. In particular, if you had enough memory to store your ordinals mapping in memory, this means that the file-system cache will likely be able to cache the whole ordinals mapping as well (you may just need to decrease a little the amount of memory given the the JVM) so random access should remain fast?
        Hide
        mikemccand Michael McCandless added a comment -

        +1 to move ords to disk.

        Show
        mikemccand Michael McCandless added a comment - +1 to move ords to disk.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1504868 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1504868 ]

        LUCENE-5119: DiskDV SortedDocValues shouldnt hold doc-to-ord in heap

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1504868 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1504868 ] LUCENE-5119 : DiskDV SortedDocValues shouldnt hold doc-to-ord in heap
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1504873 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1504873 ]

        LUCENE-5119: DiskDV SortedDocValues shouldnt hold doc-to-ord in heap

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1504873 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1504873 ] LUCENE-5119 : DiskDV SortedDocValues shouldnt hold doc-to-ord in heap
        Hide
        jpountz Adrien Grand added a comment -

        4.5 release -> bulk close

        Show
        jpountz Adrien Grand added a comment - 4.5 release -> bulk close

          People

          • Assignee:
            Unassigned
            Reporter:
            rcmuir Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development