Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We already have support for fixed block size int block codecs, making it very simple to create a codec from a int encoder algorithms like FOR/PFOR.

      But algorithms like Simple9/16 are not fixed – they encode a variable number of adjacent ints at once, depending on the specific values of those ints.

      1. LUCENE-2589.patch
        2 kB
        Michael McCandless
      2. LUCENE-2589.patch
        47 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Attached patch.

        Just like for the fixed int block case, I added a MockVariableIntBlockCodec (in src/test), with a stupid variable-sized int block encoding.

        These MockVariable/FixedIntBlockCodec serve as a good example of how one can take any low-level int encoder and turn it into a Lucene codec.

        I also increased randomness of the codecs picked for testing, by adding params like block size (for both fixed & variable mock intblock codecs) and the freq cutoff for Pulsing. So these configurations are now also randomly picked when running tests (= spikes on the monster's back).

        Show
        Michael McCandless added a comment - Attached patch. Just like for the fixed int block case, I added a MockVariableIntBlockCodec (in src/test), with a stupid variable-sized int block encoding. These MockVariable/FixedIntBlockCodec serve as a good example of how one can take any low-level int encoder and turn it into a Lucene codec. I also increased randomness of the codecs picked for testing, by adding params like block size (for both fixed & variable mock intblock codecs) and the freq cutoff for Pulsing. So these configurations are now also randomly picked when running tests (= spikes on the monster's back).
        Hide
        Michael McCandless added a comment -

        I want to make it possible to pass params to the test codecs, eg -Dtest.codecs=Pulsing(4)

        Show
        Michael McCandless added a comment - I want to make it possible to pass params to the test codecs, eg -Dtest.codecs=Pulsing(4)
        Hide
        Michael McCandless added a comment -

        Simple patch – uses regexp to parse out a single int param.

        Show
        Michael McCandless added a comment - Simple patch – uses regexp to parse out a single int param.
        Hide
        Simon Willnauer added a comment -

        Simple patch - uses regexp to parse out a single int param.

        Looks good to me - makes sense to have the size configurable. I wonder if we should start some documentation either in src/test/../package.html or on the wiki which holds information about how we test and which properties are recognized in the unit test.

        Thoughts?

        Show
        Simon Willnauer added a comment - Simple patch - uses regexp to parse out a single int param. Looks good to me - makes sense to have the size configurable. I wonder if we should start some documentation either in src/test/../package.html or on the wiki which holds information about how we test and which properties are recognized in the unit test. Thoughts?
        Hide
        Robert Muir added a comment -

        Looks good to me - makes sense to have the size configurable. I wonder if we should start some documentation either in src/test/../package.html or on the wiki which holds information about how we test and which properties are recognized in the unit test.

        Actually i would prefer we just fix LuceneTestCase etc so that each test class has a single static random seed, then there would be less parameters. Then we can change the failure message to just say 'reproduce with -D....' and I think it would be best.

        Show
        Robert Muir added a comment - Looks good to me - makes sense to have the size configurable. I wonder if we should start some documentation either in src/test/../package.html or on the wiki which holds information about how we test and which properties are recognized in the unit test. Actually i would prefer we just fix LuceneTestCase etc so that each test class has a single static random seed, then there would be less parameters. Then we can change the failure message to just say 'reproduce with -D....' and I think it would be best.
        Hide
        Simon Willnauer added a comment -

        Actually i would prefer we just fix LuceneTestCase etc so that each test class has a single static random seed, then there would be less parameters. Then we can change the failure message to just say 'reproduce with -D....' and I think it would be best.

        A single random seed per TestCase / Class would make things way easier IMO and I would agree that we should have that per Class. Nevertheless, would that deprecate all parameters? When I want to use randomized tests but need to force a certain codec that wouldn't work. Anyway, a documentation of whatever we do here would help people new to lucene to get started with patches and test.

        Show
        Simon Willnauer added a comment - Actually i would prefer we just fix LuceneTestCase etc so that each test class has a single static random seed, then there would be less parameters. Then we can change the failure message to just say 'reproduce with -D....' and I think it would be best. A single random seed per TestCase / Class would make things way easier IMO and I would agree that we should have that per Class. Nevertheless, would that deprecate all parameters? When I want to use randomized tests but need to force a certain codec that wouldn't work. Anyway, a documentation of whatever we do here would help people new to lucene to get started with patches and test.
        Hide
        Robert Muir added a comment -

        When I want to use randomized tests but need to force a certain codec that wouldn't work.

        Yes it would, as random codec selection would be determined by the same random seed (so if you use the same seed, you force the same codec).

        Anyway, a documentation of whatever we do here would help people new to lucene to get started with patches and test.

        I dont think we should add a lot of documentation (which will only become obselete as i know i will be adding even more dimensions to the test ASAP). I think its better to simplify and use a single seed for selecting all parameters!

        Show
        Robert Muir added a comment - When I want to use randomized tests but need to force a certain codec that wouldn't work. Yes it would, as random codec selection would be determined by the same random seed (so if you use the same seed, you force the same codec). Anyway, a documentation of whatever we do here would help people new to lucene to get started with patches and test. I dont think we should add a lot of documentation (which will only become obselete as i know i will be adding even more dimensions to the test ASAP). I think its better to simplify and use a single seed for selecting all parameters!
        Hide
        Michael McCandless added a comment -

        I think we should do both! (Single static random seed and some basic docs about all the neat props our tests now accept...). But I think these are separate issues? I'll commit this one (enabling you to specify codec & its param on the ant command-line) shortly.

        Show
        Michael McCandless added a comment - I think we should do both! (Single static random seed and some basic docs about all the neat props our tests now accept...). But I think these are separate issues? I'll commit this one (enabling you to specify codec & its param on the ant command-line) shortly.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development