Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      In LUCENE-2723, a lot of work was done to speed up Lucene's bulk postings read API.

      There were some upsides:

      • you could specify things like 'i dont care about frequency data up front'.
        This made things like multitermquery->filter and other consumers that don't
        care about freqs faster. But this is unrelated to 'bulkness' and we have a
        separate patch now for this on LUCENE-2929.
      • the buffersize for standardcodec was increased to 128, increasing performance
        for TermQueries, but this was unrelated too.

      But there were serious downsides/nocommits:

      • the API was hairy because it tried to be 'one-size-fits-all'. This made consumer code crazy.
      • the API could not really be specialized to your codec: e.g. could never take advantage that e.g. docs and freqs are aligned.
      • the API forced codecs to implement delta encoding for things like documents and positions.
        But this is totally up to the codec how it wants to encode! Some codecs might not use delta encoding.
      • using such an API for positions was only theoretical, it would have been super complicated and I doubt ever
        performant or maintainable.
      • there was a regression with advance(), probably because the api forced you to do both a linear scan thru
        the remaining buffer, then refill...

      I think a cleaner approach is to let codecs do whatever they want to implement the DISI
      contract. This lets codecs have the freedom to implement whatever compression/buffering they want
      for the best performance, and keeps consumers simple. If a codec uses delta encoding, or if it wants
      to defer this to the last possible minute or do it at decode time, thats its own business. Maybe a codec
      doesn't want to do any buffering at all.

        Activity

        Hide
        Yonik Seeley added a comment -

        I'm traveling this week and don't have access to that box, but I should be able to get to it next week sometime.

        Show
        Yonik Seeley added a comment - I'm traveling this week and don't have access to that box, but I should be able to get to it next week sometime.
        Hide
        Simon Willnauer added a comment -

        Yonik, can you check if you see the same thing with your benchmark if you apply LUCENE-3648

        Show
        Simon Willnauer added a comment - Yonik, can you check if you see the same thing with your benchmark if you apply LUCENE-3648
        Hide
        Simon Willnauer added a comment -

        my first question would be did you flush the FS caches and warm up your JVM? if you didn't flush caches it would be interesting what you ran first? Are those two indexes the same?

        Show
        Simon Willnauer added a comment - my first question would be did you flush the FS caches and warm up your JVM? if you didn't flush caches it would be interesting what you ran first? Are those two indexes the same?
        Hide
        Yonik Seeley added a comment -

        where is the code to your benchmark? I don't trust it.

        I'm always skeptical of benchmarks too

        No benchmark code this time, I just hit Solr directly from the browser, waiting for the times to stabilize and picking the lowest (and assuring that I can hit very near that low again and it wasn't a fluke. Results are very repeatable though (and I killed the JVM and retried to make sure hotspot would do the same thing again)

        The index is from a 10M row CSV file I generated years ago. For example, the field with 10 terms is simply a single valued field with a random number between 0 and 9, padded out to 10 chars.

        Oh, this is Linux on a Phenom II, JKD 1.6.0_29

        Show
        Yonik Seeley added a comment - where is the code to your benchmark? I don't trust it. I'm always skeptical of benchmarks too No benchmark code this time, I just hit Solr directly from the browser, waiting for the times to stabilize and picking the lowest (and assuring that I can hit very near that low again and it wasn't a fluke. Results are very repeatable though (and I killed the JVM and retried to make sure hotspot would do the same thing again) The index is from a 10M row CSV file I generated years ago. For example, the field with 10 terms is simply a single valued field with a random number between 0 and 9, padded out to 10 chars. Oh, this is Linux on a Phenom II, JKD 1.6.0_29
        Hide
        Robert Muir added a comment -

        Yonik, where is the code to your benchmark? I don't trust it.
        hotspot likes to change how it compiles readvint so be sure to use lots of jvm iterations.

        I tested this change with luceneutil (lots of iterations, takes an hour to run) and everything
        was the same, with disjunction queries looking better every time I ran it.

        I think everything is just fine.

        Task QPS trunkStdDev trunk QPS patchStdDev patch Pct diff
        IntNRQ 10.44 0.69 9.80 0.88 -19% - 9%
        Wildcard 24.93 0.41 24.23 0.44 -6% - 0%
        Prefix3 48.83 1.14 47.45 1.09 -7% - 1%
        TermBGroup1M1P 43.29 1.08 42.28 1.31 -7% - 3%
        PKLookup 187.88 4.49 186.43 5.07 -5% - 4%
        AndHighHigh 15.10 0.25 14.99 0.54 -5% - 4%
        SpanNear 15.96 0.43 15.87 0.43 -5% - 4%
        TermBGroup1M 32.30 0.87 32.14 0.64 -4% - 4%
        SloppyPhrase 14.53 0.50 14.47 0.55 -7% - 7%
        TermGroup1M 24.07 0.54 24.01 0.48 -4% - 4%
        Respell 87.11 3.74 86.91 4.05 -8% - 9%
        Fuzzy1 94.79 3.18 94.58 4.05 -7% - 7%
        Fuzzy2 48.13 1.92 48.10 2.45 -8% - 9%
        Phrase 9.10 0.41 9.11 0.41 -8% - 9%
        Term 135.52 4.74 137.26 2.91 -4% - 7%
        AndHighMed 51.64 0.92 53.20 1.90 -2% - 8%
        OrHighHigh 10.75 0.62 11.79 0.60 -1% - 22%
        OrHighMed 12.20 0.75 13.40 0.71 -1% - 23%
        Show
        Robert Muir added a comment - Yonik, where is the code to your benchmark? I don't trust it. hotspot likes to change how it compiles readvint so be sure to use lots of jvm iterations. I tested this change with luceneutil (lots of iterations, takes an hour to run) and everything was the same, with disjunction queries looking better every time I ran it. I think everything is just fine. Task QPS trunkStdDev trunk QPS patchStdDev patch Pct diff IntNRQ 10.44 0.69 9.80 0.88 -19% - 9% Wildcard 24.93 0.41 24.23 0.44 -6% - 0% Prefix3 48.83 1.14 47.45 1.09 -7% - 1% TermBGroup1M1P 43.29 1.08 42.28 1.31 -7% - 3% PKLookup 187.88 4.49 186.43 5.07 -5% - 4% AndHighHigh 15.10 0.25 14.99 0.54 -5% - 4% SpanNear 15.96 0.43 15.87 0.43 -5% - 4% TermBGroup1M 32.30 0.87 32.14 0.64 -4% - 4% SloppyPhrase 14.53 0.50 14.47 0.55 -7% - 7% TermGroup1M 24.07 0.54 24.01 0.48 -4% - 4% Respell 87.11 3.74 86.91 4.05 -8% - 9% Fuzzy1 94.79 3.18 94.58 4.05 -7% - 7% Fuzzy2 48.13 1.92 48.10 2.45 -8% - 9% Phrase 9.10 0.41 9.11 0.41 -8% - 9% Term 135.52 4.74 137.26 2.91 -4% - 7% AndHighMed 51.64 0.92 53.20 1.90 -2% - 8% OrHighHigh 10.75 0.62 11.79 0.60 -1% - 22% OrHighMed 12.20 0.75 13.40 0.71 -1% - 23%
        Hide
        Yonik Seeley added a comment -

        I tested Solr's faceting code (the enum method that steps over terms and uses the filterCache), with minDf set high enough so that the filterCache wouldn't be used (i.e.it directly uses DocsEnum to calculate the count for the term). %increase when we were using the bulk API = r208282/trunk time (i.e. performance is measured as change in throughput... so going from 400ms to 200ms is expressed as 100% increase in throughput).

        number of terms documents per term bulk API performance increase
        10000000 1 2.1
        1000000 10 3.0
        1000 10000 8.9
        10 1000000 51.6

        So when terms match many documents, we've had quite a drop-off due to the removal of the bulk API.

        Show
        Yonik Seeley added a comment - I tested Solr's faceting code (the enum method that steps over terms and uses the filterCache), with minDf set high enough so that the filterCache wouldn't be used (i.e.it directly uses DocsEnum to calculate the count for the term). %increase when we were using the bulk API = r208282/trunk time (i.e. performance is measured as change in throughput... so going from 400ms to 200ms is expressed as 100% increase in throughput). number of terms documents per term bulk API performance increase 10000000 1 2.1 1000000 10 3.0 1000 10000 8.9 10 1000000 51.6 So when terms match many documents, we've had quite a drop-off due to the removal of the bulk API.
        Hide
        Michael McCandless added a comment -

        +1

        Show
        Michael McCandless added a comment - +1
        Hide
        Simon Willnauer added a comment -

        +1

        Show
        Simon Willnauer added a comment - +1
        Hide
        Uwe Schindler added a comment - - edited

        +1

        Maybe we should also add buffering to 3.x codec, but thats not soo important.

        Show
        Uwe Schindler added a comment - - edited +1 Maybe we should also add buffering to 3.x codec, but thats not soo important.
        Hide
        Robert Muir added a comment -

        patch: nuking the bulk api and implementing buffering for standardcodec so we have the same performance.

        Show
        Robert Muir added a comment - patch: nuking the bulk api and implementing buffering for standardcodec so we have the same performance.

          People

          • Assignee:
            Robert Muir
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development