Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-391

Make vector more space efficient with variable-length encoding, et al

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      There are a few things we can do to make Vector representations smaller on disk:

      • Use variable-length encoding for integer values like size and element indices in sparse representations
      • Further, delta-encode indices in sequential representations
      • Let caller specify that precision isn't crucial in values, allowing it to store values as floats

      Since indices are usually small-ish, I'd guess this saves 2 bytes or so on average, out of 12 bytes per element now.
      Using floats where applicable saves another 4. Not bad.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            srowen Sean R. Owen
            srowen Sean R. Owen
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment