Lucene - Core
  1. Lucene - Core
  2. LUCENE-2159

Tool to expand the index for perf/stress testing.

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0
    • Fix Version/s: None
    • Component/s: modules/other
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Sometimes it is useful to take a small-ish index and expand it into a large index with K segments for perf/stress testing.

      This tool does that. See attached class.

        Activity

        Hide
        John Wang added a comment -

        I have put it under contrib/misc, in package org.apache.lucene.index

        Show
        John Wang added a comment - I have put it under contrib/misc, in package org.apache.lucene.index
        Hide
        Shai Erera added a comment -

        This looks like a nice tool. But all it does is create multiple copies of the same segment(s) right? So what exactly do you want to test with it? What worries me is that we'll be multiplying the lexicon, posting lists, statistics etc., therefore I'm not sure how reliable the tests will be (whatever they are), except for measuring things related to large number of segments (like merge performance). Am I right?

        I also think this class better fits in benchmark rather than misc, as it's really for perf. testing/measurements and not as a generic utility ... You can create a Task out if it, like ExpandIndexTask which one can include in his algorithm.

        Show
        Shai Erera added a comment - This looks like a nice tool. But all it does is create multiple copies of the same segment(s) right? So what exactly do you want to test with it? What worries me is that we'll be multiplying the lexicon, posting lists, statistics etc., therefore I'm not sure how reliable the tests will be (whatever they are), except for measuring things related to large number of segments (like merge performance). Am I right? I also think this class better fits in benchmark rather than misc, as it's really for perf. testing/measurements and not as a generic utility ... You can create a Task out if it, like ExpandIndexTask which one can include in his algorithm.
        Hide
        John Wang added a comment -

        Shai:

        You are right, we found this tool useful with testing performance implications under index segmentation. I understand having a general performance suite to test regression is a good thing. But we found having a more focused test for segmentation and merge is important.

        -John

        Show
        John Wang added a comment - Shai: You are right, we found this tool useful with testing performance implications under index segmentation. I understand having a general performance suite to test regression is a good thing. But we found having a more focused test for segmentation and merge is important. -John
        Hide
        Shai Erera added a comment -

        I understand having a general performance suite to test regression is a good thing. But we found having a more focused test for segmentation and merge is important.

        Are you saying that because of the benchmark proposal? I still think that an ExpandIndexTask will be useful for benchmark and fits better there, than in contrib/misc. We can have that task together w/ a predefined .alg for using it ...

        Show
        Shai Erera added a comment - I understand having a general performance suite to test regression is a good thing. But we found having a more focused test for segmentation and merge is important. Are you saying that because of the benchmark proposal? I still think that an ExpandIndexTask will be useful for benchmark and fits better there, than in contrib/misc. We can have that task together w/ a predefined .alg for using it ...
        Hide
        John Wang added a comment -

        Shai:

        I am just stating our experiences. I am not commenting on how it should affect the benchmark proposal at all.

        Whether it should be in bench or contrib/misc, this would be a call for the committers.

        Thanks

        -John

        Show
        John Wang added a comment - Shai: I am just stating our experiences. I am not commenting on how it should affect the benchmark proposal at all. Whether it should be in bench or contrib/misc, this would be a call for the committers. Thanks -John
        Hide
        Shai Erera added a comment -

        Which is fine - I think this would be a neat task to add to benchmark, w/ specific documentation on how to use it and for what purposes. If you can also write a sample .alg file which e.g. creates a small index and then Expand it, that'd be great.

        I've looked at the different PerfTask implementations in benchmark, and I'm thinking if we perhaps should do the following:

        • Create an AddIndexesTask which receives one or more Directories as input and calls writer.addIndexesNoOptimize
        • If one wants, he can add an OptimizeTask call afterwards.
        • Write an expandIndex.alg which initially creates an index of size N from one content source and then calls the AddIndexesTask several times. The .alg file is meant to be an example as well as people can change it to create bigger or smaller indexes, use other content sources and switch between RAM/FS directories.

        How's that sound?

        Show
        Shai Erera added a comment - Which is fine - I think this would be a neat task to add to benchmark, w/ specific documentation on how to use it and for what purposes. If you can also write a sample .alg file which e.g. creates a small index and then Expand it, that'd be great. I've looked at the different PerfTask implementations in benchmark, and I'm thinking if we perhaps should do the following: Create an AddIndexesTask which receives one or more Directories as input and calls writer.addIndexesNoOptimize If one wants, he can add an OptimizeTask call afterwards. Write an expandIndex.alg which initially creates an index of size N from one content source and then calls the AddIndexesTask several times. The .alg file is meant to be an example as well as people can change it to create bigger or smaller indexes, use other content sources and switch between RAM/FS directories. How's that sound?
        Hide
        John Wang added a comment -

        Yeah, that sounds great!
        I will need to learn how to write .alg files

        Show
        John Wang added a comment - Yeah, that sounds great! I will need to learn how to write .alg files
        Hide
        Mark Miller added a comment -

        There is an excellent section on it in LIA2

        Show
        Mark Miller added a comment - There is an excellent section on it in LIA2
        Hide
        Shai Erera added a comment -

        There is an excellent section on it in LIA2

        Indeed !

        Ok so to create a task, you just extend PerfTask. You can look under contrib/benchmark/src/java/o.a.l/benchmark/byTask/tasks for many examples. OptimizeTask seems relevant here (i.e. it calls an IW API and receives a parameter).

        For writing .alg files, that's SUPER simple, just look under contrib/benchmark/conf for many existing examples. You can post a patch once you feel comfortable enough with it and I can help you with the struggles (if you'll run into any). Another great source (besides LIA2) on writing .alg files is the package.html under contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask.

        Show
        Shai Erera added a comment - There is an excellent section on it in LIA2 Indeed ! Ok so to create a task, you just extend PerfTask. You can look under contrib/benchmark/src/java/o.a.l/benchmark/byTask/tasks for many examples. OptimizeTask seems relevant here (i.e. it calls an IW API and receives a parameter). For writing .alg files, that's SUPER simple, just look under contrib/benchmark/conf for many existing examples. You can post a patch once you feel comfortable enough with it and I can help you with the struggles (if you'll run into any). Another great source (besides LIA2) on writing .alg files is the package.html under contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask.

          People

          • Assignee:
            Unassigned
            Reporter:
            John Wang
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development