Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
0.5
-
None
-
None
Description
Some web pages (such as makler.su, see attached) have lots of script data before the body of the HTML.
When this happens, the sniffing code fails to find the charset info in the meta tag, because it currently only sniffs the first 4K.
Bumping it to 8K would cover all of the cases that I (Ken) have seen during a test crawl.