The list of HTML elements used to extract outlinks from (in DOMContentUtils (parse-html) and DOMContentUtils (parse-tika)) needs to be updated/completed to include HTML elements common in HTML5. Cf. a related question on stackoverflow about the <object> element
A (mostly?) up-to-date list of HTML elements could be taken from the extractor of iipc/webarchiv-commons.