Description
Currently, to get a document's default language to pass to an ExtractionContext, only the "xml:lang" attribute in the HTML node is checked.
However, after reading this w3 article on document language declaration, and this w3 article on meta declarations, it appears that we should also be checking the "lang" attribute, and, as a fallback, the META http-equiv="Content-Language" elements.
Also: there seems to be some overlap here with (at least) the HTMLMetaExtractor, which, conversely, appears to check the "lang" attribute, and not the "xml:lang" attribute. Could the HTMLMetaExtractor just retrieve the default document language from the ExtractionContext rather than looking it up in the document all over again?