Lucene - Core
  1. Lucene - Core
  2. LUCENE-706

Index File Format - Example for frequency file .frq is wrong

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: general/website
    • Labels:
      None
    • Environment:

      not applicable

    • Lucene Fields:
      New, Patch Available

      Description

      Reported by Johan Stuyts - http://www.nabble.com/Possible-documentation-error--p7012445.html -

      Frequency file example says:

      For example, the TermFreqs for a term which occurs once in document seven and three times in document eleven would be the following sequence of VInts:
      15, 22, 3

      It should be:

      For example, the TermFreqs for a term which occurs once in document seven and three times in document eleven would be the following sequence of VInts:
      15, 8, 3

        Activity

        Hide
        Doron Cohen added a comment -

        example fixed

        Show
        Doron Cohen added a comment - example fixed
        Hide
        Grant Ingersoll added a comment -

        Just to double check the math:
        From the Website:
        "DocDelta determines both the document number and the frequency. In particular, DocDelta/2 is the difference between this document number and the previous document number (or zero when this is the first document in a TermFreqs). When DocDelta is odd, the frequency is one. When DocDelta is even, the frequency is read as another VInt."

        So, 15 is the correct start since 15 /2 as an int is 7 and the frequency is one. Then the difference between doc 7 and 11 is 4, so the next value should be 8 (since DocDelta/2 = 11 - 7, which is even, meaning the frequency is the next VInt, in this case 3, so I would concur.

        Show
        Grant Ingersoll added a comment - Just to double check the math: From the Website: "DocDelta determines both the document number and the frequency. In particular, DocDelta/2 is the difference between this document number and the previous document number (or zero when this is the first document in a TermFreqs). When DocDelta is odd, the frequency is one. When DocDelta is even, the frequency is read as another VInt." So, 15 is the correct start since 15 /2 as an int is 7 and the frequency is one. Then the difference between doc 7 and 11 is 4, so the next value should be 8 (since DocDelta/2 = 11 - 7, which is even, meaning the frequency is the next VInt, in this case 3, so I would concur.
        Hide
        Doron Cohen added a comment -

        Right -

        15 = 2 * 7 + 1 --> doc 7 with freq 1
        8 = 2 * (11 - 7) --> doc 11 with frequency > 1
        3 --> frequency = 3 for doc 11

        .frq file actual content for similar case also agrees with that, it starts like this (Hex):

        0D 08 03 01 03

        (note: Hex: 0D = 15.)

        Show
        Doron Cohen added a comment - Right - 15 = 2 * 7 + 1 --> doc 7 with freq 1 8 = 2 * (11 - 7) --> doc 11 with frequency > 1 3 --> frequency = 3 for doc 11 .frq file actual content for similar case also agrees with that, it starts like this (Hex): 0D 08 03 01 03 (note: Hex: 0D = 15.)
        Hide
        Steve Rowe added a comment -

        Hex: 0D is NOT the same as decimal 15. 0Dh = 13d. 15d = 0Fh.

        Show
        Steve Rowe added a comment - Hex: 0D is NOT the same as decimal 15. 0Dh = 13d. 15d = 0Fh.
        Hide
        Doron Cohen added a comment -

        Right, sorry, copied that hex data from an .frq of an index with a different example, where the frequencies were 1 in doc 6 and 3 in doc 10, so there you would get 2 * 6 + 1 = 13.

        For the correct example of freq 1 in doc 7 and 3 in doc 11 the .frq content is 0F 08 03 as it should be.

        (Meaning that the documentatin should still be fixed...

        Show
        Doron Cohen added a comment - Right, sorry, copied that hex data from an .frq of an index with a different example, where the frequencies were 1 in doc 6 and 3 in doc 10, so there you would get 2 * 6 + 1 = 13. For the correct example of freq 1 in doc 7 and 3 in doc 11 the .frq content is 0F 08 03 as it should be. (Meaning that the documentatin should still be fixed...
        Hide
        Grant Ingersoll added a comment -

        Applied. Thanks Johan and Doron.

        Show
        Grant Ingersoll added a comment - Applied. Thanks Johan and Doron.

          People

          • Assignee:
            Grant Ingersoll
            Reporter:
            Doron Cohen
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development