Uploaded image for project: 'Metron (Retired)'
  1. Metron (Retired)
  2. METRON-1534

Typosquat Detection via Bloom filters overlaps

    XMLWordPrintableJSON

Details

    • Bug
    • Status: To Do
    • Major
    • Resolution: Unresolved
    • 0.4.3
    • None
    • None

    Description

      The typosquat detection use case overpopulates the bloom filter. 

      For example, using the alexa 10k set, cnn.com, or bbc.co.uk  are both detected as typosquats. While legitimate in themselves, they appear in dns twists of other legitimate domains. (e.g. bbc is a typosquat for rbc).

      This problem is further accentuated by a longer set of legitimate domains such as the alexa 1m.

      The bloom filter additions need to be be prevented for values which are included in the raw 'good' source. This is hard to do in a space and compute performant way with the current implementation, given the need to effectively join the full input set (smallish) with the generated set (very large).

      Attachments

        Activity

          People

            Unassigned Unassigned
            simonellistonball Simon Elliston Ball
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: