Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-265

[PATCH] to remove synchronized code from TermVectorsReader

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/index
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

    • Bugzilla Id:
      30736

      Description

      Otis,

      here the latest and last patch to get rid of all synchronized code from
      TermVectorsReader. It should include at least 3 files, TermVectorsReader.diff,
      SegmentReader.diff and the new junit test case TestMultiThreadTermVectors.java.
      The patch was generated against the current CVS version of TermVectorsReader and
      SegmentReader. All lucene related junit tests pass fine.

      best regards
      Bernhard

        Activity

        Hide
        bernhard.messer@intrafind.de Bernhard Messer added a comment -

        Created an attachment (id=12477)
        [PATCH] new version of TermVectorsReader without synchronized code.

        Show
        bernhard.messer@intrafind.de Bernhard Messer added a comment - Created an attachment (id=12477) [PATCH] new version of TermVectorsReader without synchronized code.
        Hide
        bernhard.messer@intrafind.de Bernhard Messer added a comment -

        Created an attachment (id=12478)
        [PATCH] new version of SegmentReader using ThreadLocal for accessing TermVectorsReader

        Show
        bernhard.messer@intrafind.de Bernhard Messer added a comment - Created an attachment (id=12478) [PATCH] new version of SegmentReader using ThreadLocal for accessing TermVectorsReader
        Hide
        bernhard.messer@intrafind.de Bernhard Messer added a comment -

        Created an attachment (id=12479)
        [PATCH} new multithreaded TermVectors test class

        Show
        bernhard.messer@intrafind.de Bernhard Messer added a comment - Created an attachment (id=12479) [PATCH} new multithreaded TermVectors test class
        Hide
        otis@apache.org Otis Gospodnetic added a comment -

        Bernhard,

        Thanks for the patch. The unit test requires class o.a.lucene.util.English.
        This is not in CVS. Is this something that should be in the CVS? What is it?

        I am also wondering about this piece of code:

        • termVectorsReader = new TermVectorsReader(cfsDir, segment, fieldInfos);
          + final Directory dir = cfsDir;
          + termVectorsLocal = new ThreadLocal() {
          + protected synchronized Object initialValue()
          Unknown macro: {+ try { + return new TermVectorsReader(dir, segment, fieldInfos); + } catch (IOException ioe) { + ioe.printStackTrace(); + return null; + }+ }

          + };

        Is is a good thing to 'eat' that IOException and quietly return null? The
        method where this code is, is already throwing IOException, so why not let the
        IOException pop up?

        Finally, it looks like diffs contain tabs. Could you please change tabs to 2
        spaces?

        Thanks,
        Otis

        Show
        otis@apache.org Otis Gospodnetic added a comment - Bernhard, Thanks for the patch. The unit test requires class o.a.lucene.util.English. This is not in CVS. Is this something that should be in the CVS? What is it? I am also wondering about this piece of code: termVectorsReader = new TermVectorsReader(cfsDir, segment, fieldInfos); + final Directory dir = cfsDir; + termVectorsLocal = new ThreadLocal() { + protected synchronized Object initialValue() Unknown macro: {+ try { + return new TermVectorsReader(dir, segment, fieldInfos); + } catch (IOException ioe) { + ioe.printStackTrace(); + return null; + }+ } + }; Is is a good thing to 'eat' that IOException and quietly return null? The method where this code is, is already throwing IOException, so why not let the IOException pop up? Finally, it looks like diffs contain tabs. Could you please change tabs to 2 spaces? Thanks, Otis
        Hide
        bernhard.messer@intrafind.de Bernhard Messer added a comment -

        Created an attachment (id=12501)
        [PATCH] all source and diff files necessary for the patch

        Show
        bernhard.messer@intrafind.de Bernhard Messer added a comment - Created an attachment (id=12501) [PATCH] all source and diff files necessary for the patch
        Hide
        bernhard.messer@intrafind.de Bernhard Messer added a comment -

        Otis,

        here is the patch to get rid of synchronized parts in TermVectorsReader. As
        described in my previous mail, TermVectorReader is cloneable now. In
        SegmentReader there is still the original TermVectorReader object created,. This
        object is necessary so that we can close opened streams. All threads get there
        own local copy, which is a clone from the original and stored within the
        ThreadLocal.

        There is also a change in Exception handling. I noticed that IndexReader already
        throws an IOException in the two getTermFreqVectors(...) methods. In the
        implementation of the cvs version, TermVectorsReader class catches all
        exceptions and prints either stacktrace or swallowed it. Exception within
        TermVecvtorsReader never popped up at the caller. The patch now catches all
        possible exceptions in TermVectorsReader and is throwing a new IOException which
        is passed to SegmentReader.

        The zip attachment includes 5 files:
        TermVectorsReader.diff
        SegmentReader.diff
        TestMultiThreadTermVectors.java – this is a new junit test to test multithreading
        TestSegmentReader.diff – IOException has to be caught
        TestTermVectorsReader.diff – adjust to the new behaviour

        all junit tests passed fine. Although a test indexing 25.000 text files passed.

        best regards
        Bernhard

        Show
        bernhard.messer@intrafind.de Bernhard Messer added a comment - Otis, here is the patch to get rid of synchronized parts in TermVectorsReader. As described in my previous mail, TermVectorReader is cloneable now. In SegmentReader there is still the original TermVectorReader object created,. This object is necessary so that we can close opened streams. All threads get there own local copy, which is a clone from the original and stored within the ThreadLocal. There is also a change in Exception handling. I noticed that IndexReader already throws an IOException in the two getTermFreqVectors(...) methods. In the implementation of the cvs version, TermVectorsReader class catches all exceptions and prints either stacktrace or swallowed it. Exception within TermVecvtorsReader never popped up at the caller. The patch now catches all possible exceptions in TermVectorsReader and is throwing a new IOException which is passed to SegmentReader. The zip attachment includes 5 files: TermVectorsReader.diff SegmentReader.diff TestMultiThreadTermVectors.java – this is a new junit test to test multithreading TestSegmentReader.diff – IOException has to be caught TestTermVectorsReader.diff – adjust to the new behaviour all junit tests passed fine. Although a test indexing 25.000 text files passed. best regards Bernhard
        Hide
        otis@apache.org Otis Gospodnetic added a comment -

        Bernhard,

        I'll try looking at this some time this week. Did you run your performance
        tests against the code with these changes applied? What is the performance gain
        now?

        Thanks.

        Show
        otis@apache.org Otis Gospodnetic added a comment - Bernhard, I'll try looking at this some time this week. Did you run your performance tests against the code with these changes applied? What is the performance gain now? Thanks.
        Hide
        bernhard.messer@intrafind.de Bernhard Messer added a comment -

        yes i did. The performance improve will be visible having multiple threads
        fetching term vectors and sharing the same IndexReader. It's hardly to test on a
        single processor machine and thats all i have for the moment. The tests i did,
        runs against an index with 25.000 text files (avg. 4KB per file). The test is
        firing 500 calls per thread to fetch term vectors with a random generated
        document id. Before i start the real measurement, i'm running several thousand
        queries against the index to warm up lucene and the file system cache. What you
        see is that the performance boost is between 0% running a single thread and 30%
        running 10 threads. I'm sure it would be even better running it on a multiple
        processor machine.
        Where i have no explanation for, is that the index creation process is about 10%
        faster using the patch.

        best regards
        Bernhard

        Show
        bernhard.messer@intrafind.de Bernhard Messer added a comment - yes i did. The performance improve will be visible having multiple threads fetching term vectors and sharing the same IndexReader. It's hardly to test on a single processor machine and thats all i have for the moment. The tests i did, runs against an index with 25.000 text files (avg. 4KB per file). The test is firing 500 calls per thread to fetch term vectors with a random generated document id. Before i start the real measurement, i'm running several thousand queries against the index to warm up lucene and the file system cache. What you see is that the performance boost is between 0% running a single thread and 30% running 10 threads. I'm sure it would be even better running it on a multiple processor machine. Where i have no explanation for, is that the index creation process is about 10% faster using the patch. best regards Bernhard
        Hide
        goller@detego-software.de Christoph Goller added a comment -

        Thank you very much for your patch.
        I reviewed and applied it.

        best regards,
        Christoph

        Show
        goller@detego-software.de Christoph Goller added a comment - Thank you very much for your patch. I reviewed and applied it. best regards, Christoph

          People

          • Assignee:
            java-dev@lucene.apache.org Lucene Developers
            Reporter:
            bernhard.messer@intrafind.de Bernhard Messer
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development