Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-299

Collocations: improve performance by making Gram BinaryComparable

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Robin's profiling indicated that a large portion of a run was spent in readFields() in Gram due to the deserialization occuring as a part of Gram comparions for sorting. He pointed me to BinaryComparable and the implementation in Text.

      Like Text, in this new implementation, Gram stores its string in binary form. When encoding the string at construction time we allocate an extra character's worth of data to hold the Gram type information. When sorting Grams, the binary arrays are compared instead of deserializing and comparing fields.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            drew.farris Drew Farris
            drew.farris Drew Farris
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment