Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-2452

Add a playcount threshold to the MusicProfiles example

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.10.0
    • 0.10.0
    • None

    Description

      In the MusicProfiles example, when creating the user-user similarity graph, an edge is created between any 2 users that have listened to the same song (even if once). Depending on the input data, this might produce a projection graph with many more edges than the original user-song graph.
      To make this computation more efficient, this issue proposes adding a user-defined parameter that filters out songs that a user has listened to only a few times. Essentially, it is a threshold for playcount, above which a user is considered to like a song.

      For reference, with a threshold value of 30, the whole Last.fm dataset is analyzed on my laptop in a few minutes, while no threshold results in a runtime of several hours.

      There are many solutions to this problem, but since this is just an example (not a library method), I think that keeping it simple is important.

      Thanks to andralungu for spotting the inefficiency!

      Attachments

        Issue Links

          Activity

            People

              vkalavri Vasia Kalavri
              vkalavri Vasia Kalavri
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: