Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0
    • Fix Version/s: 4.1, Trunk
    • Component/s: core/codecs
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Currently the user need to make sure that a direct source is not shared between threads and each time someone calls getDirectSource we create a new source which has a reasonable overhead. We can certainly reduce the overhead (maybe different issue) but it should be easier for the user to get a direct source and handle it. More than that is should be consistent with getSource / loadSource.

      1. LUCENE-4538.patch
        12 kB
        Simon Willnauer
      2. LUCENE-4538.patch
        22 kB
        Simon Willnauer

        Activity

        Hide
        Simon Willnauer added a comment -

        a patch including loadDirectSource in the SourceCache caching DirectSources in a CloseableThreadLocal

        Show
        Simon Willnauer added a comment - a patch including loadDirectSource in the SourceCache caching DirectSources in a CloseableThreadLocal
        Hide
        Robert Muir added a comment -

        I haven't looked in detail (will try to do a real review later) but I think something like this is really needed.

        Its currently absurd: we make it easy as pie to load up enormous data structures in RAM but nearly impossible to use stuff from disk.

        It should be the other way around.

        We can certainly reduce the overhead (maybe different issue)

        We should do this too!

        Show
        Robert Muir added a comment - I haven't looked in detail (will try to do a real review later) but I think something like this is really needed. Its currently absurd: we make it easy as pie to load up enormous data structures in RAM but nearly impossible to use stuff from disk. It should be the other way around. We can certainly reduce the overhead (maybe different issue) We should do this too!
        Hide
        Adrien Grand added a comment -

        We can certainly reduce the overhead (maybe different issue)

        Are you speaking about the PackedInts headers?

        Show
        Adrien Grand added a comment - We can certainly reduce the overhead (maybe different issue) Are you speaking about the PackedInts headers?
        Hide
        Robert Muir added a comment -

        Among other things: look at how expensive it is today to create a Packed ints direct source (its just an example, other DV impls also read lots of similar stuff)

        1. IndexInput.clone(): PackedIntValues:221 return new
          PackedIntsSource(datIn.clone(), true);
        2. readLong(): PackedIntValues:223 minValue = dataIn.readLong();
        3. readLong(); PackedIntValues:224 defaultValue = dataIn.readLong();
        4. readInt(); Codec magic <-- PackedInts.getDirectReader()
        5. readString(); Codec name
        6. readInt(); Codec version
        7. readVInt(); bits per value
        8. readVInt(); value count
        9. readVint(); format

        I opened LUCENE-4539 for this. Taking a cursory look at e.g. the packedints impl, i think we can fix
        it by just fixing our impl (e.g. no file formats change).

        Show
        Robert Muir added a comment - Among other things: look at how expensive it is today to create a Packed ints direct source (its just an example, other DV impls also read lots of similar stuff) IndexInput.clone(): PackedIntValues:221 return new PackedIntsSource(datIn.clone(), true); readLong(): PackedIntValues:223 minValue = dataIn.readLong(); readLong(); PackedIntValues:224 defaultValue = dataIn.readLong(); readInt(); Codec magic <-- PackedInts.getDirectReader() readString(); Codec name readInt(); Codec version readVInt(); bits per value readVInt(); value count readVint(); format I opened LUCENE-4539 for this. Taking a cursory look at e.g. the packedints impl, i think we can fix it by just fixing our impl (e.g. no file formats change).
        Hide
        Simon Willnauer added a comment -

        here is a new patch makeing the loadSource & loadDirectSource protected. This is really confusing if you have two ways to get a Source instance and you need to take care of if it is cached or not. This really should not have been public at all.

        I will commit this soon

        Show
        Simon Willnauer added a comment - here is a new patch makeing the loadSource & loadDirectSource protected. This is really confusing if you have two ways to get a Source instance and you need to take care of if it is cached or not. This really should not have been public at all. I will commit this soon
        Hide
        Simon Willnauer added a comment -

        committed to trunk in rev. 1406153
        backported to 4.x in rev. 1406169

        Show
        Simon Willnauer added a comment - committed to trunk in rev. 1406153 backported to 4.x in rev. 1406169
        Hide
        Robert Muir added a comment -

        Thanks Simon!

        Show
        Robert Muir added a comment - Thanks Simon!
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Simon Willnauer
        http://svn.apache.org/viewvc?view=revision&revision=1406169

        LUCENE-4538: Cache DocValues DirectSource

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Simon Willnauer http://svn.apache.org/viewvc?view=revision&revision=1406169 LUCENE-4538 : Cache DocValues DirectSource

          People

          • Assignee:
            Simon Willnauer
            Reporter:
            Simon Willnauer
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development