Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Affects Version/s: 0.1
    • Fix Version/s: None
    • Component/s: Math
    • Labels:
      None

      Description

      A selection of these algorithms would be very nice to have:

      www.cs.rmit.edu.au/~jz/fulltext/compjour99.pdf

        Activity

        Hide
        Grant Ingersoll added a comment -

        Can we leverage some of Lucene's capabilities here? It doesn't have all the algorithms mentioned, but does have delta encodings.

        Show
        Grant Ingersoll added a comment - Can we leverage some of Lucene's capabilities here? It doesn't have all the algorithms mentioned, but does have delta encodings.
        Hide
        Ted Dunning added a comment -

        First version. 100% test coverage except for BitInputStream at 95%.

        Needs review for API and obvious speed improvements.

        Show
        Ted Dunning added a comment - First version. 100% test coverage except for BitInputStream at 95%. Needs review for API and obvious speed improvements.
        Hide
        Ted Dunning added a comment -

        I considered that, but it seemed easier to write a version from scratch. Somebody should look at the Lucene code to see if they did anything earth-shakingly clever for speed.

        A delta-code, btw, includes gamma and unary codes so that is complete enough to be interesting.

        It should be very little work to add Golomb and byte variable codes to this. Only the byte-variable code seems like a good candidate because it can be made faster than these other forms pretty easily.

        Show
        Ted Dunning added a comment - I considered that, but it seemed easier to write a version from scratch. Somebody should look at the Lucene code to see if they did anything earth-shakingly clever for speed. A delta-code, btw, includes gamma and unary codes so that is complete enough to be interesting. It should be very little work to add Golomb and byte variable codes to this. Only the byte-variable code seems like a good candidate because it can be made faster than these other forms pretty easily.
        Hide
        Grant Ingersoll added a comment -

        I think you forgot to attach the patch

        Show
        Grant Ingersoll added a comment - I think you forgot to attach the patch
        Hide
        Ted Dunning added a comment -

        How very silly to over look this. I think that changing the status to "patch available" counted as an upload in my brain (or something)

        Show
        Ted Dunning added a comment - How very silly to over look this. I think that changing the status to "patch available" counted as an upload in my brain (or something)
        Hide
        Sean Owen added a comment -

        Question, is this commitable? Looks so. My only question is do we have a use case for this and does it support the project goals.

        Show
        Sean Owen added a comment - Question, is this commitable? Looks so. My only question is do we have a use case for this and does it support the project goals.
        Hide
        Ted Dunning added a comment -

        Then intent for this was for improving the storage of integer arrays,
        especially in the context of sparse vectors. This would have the largest
        impact on binary immutable vectors.

        There are a few impinging factors. The first is that Lucene now has an
        implementation of PFOR and PFOR-delta that shoudl provide much better
        performance and slightly better compression. The second is the adoption of
        colt and attendant digestion load. Jake would be the most likely consumer
        of these routines, but he won't be looking for them for some time.

        My vote is to mothball it.


        Ted Dunning, CTO
        DeepDyve

        Show
        Ted Dunning added a comment - Then intent for this was for improving the storage of integer arrays, especially in the context of sparse vectors. This would have the largest impact on binary immutable vectors. There are a few impinging factors. The first is that Lucene now has an implementation of PFOR and PFOR-delta that shoudl provide much better performance and slightly better compression. The second is the adoption of colt and attendant digestion load. Jake would be the most likely consumer of these routines, but he won't be looking for them for some time. My vote is to mothball it. – Ted Dunning, CTO DeepDyve

          People

          • Assignee:
            Ted Dunning
            Reporter:
            Ted Dunning
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development