Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1286

Memory-efficient DataModel, supporting fast online updates and element-wise iteration

    XMLWordPrintableJSON

Details

    Description

      Most DataModel implementation in current CF component use hash map to enable fast 2d indexing and update. This is not memory-efficient for big data set. e.g. Netflix prize dataset takes 11G heap space as a FileDataModel.

      Improved implementation of DataModel should use more compact data structure (like arrays), this can trade a little of time complexity in 2d indexing for vast improvement in memory efficiency. In addition, any online recommender or online-to-batch converted recommender will not be affected by this in training process.

      Attachments

        1. benchmark.patch
          29 kB
          Gokhan Capan
        2. InMemoryDataModel.java
          24 kB
          Peng Cheng
        3. InMemoryDataModelTest.java
          6 kB
          Peng Cheng
        4. Semifinal-implementation-added.patch
          32 kB
          Peng Cheng

        Issue Links

          Activity

            People

              smarthi Suneel Marthi
              peng Peng Cheng
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 336h
                  336h
                  Remaining:
                  Remaining Estimate - 336h
                  336h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified