Description
We extract Javascript as text content while instead it is actually a script tag with base64 inline. This inline code is decoded and reported in the characters() method of our custom ContentHandler, and ends up as text being extracted, but it seems the Javascript start tag itself is never reported to startElement(). The Javascript is reported to characters() after we left the head and entered the body.
HTML file is attached
The following script tag:
<script src="data:text/javascript;base64,Oyh3aW5kb3cuanExODN8fGpRdWVyeSkoZnVuY3Rpb24oJCl7bmV3IEltcHJvdmVkQUpBWExvZ2luKHsNCmlkOiAxNTcsDQppc0d1ZXN0OiAxLA0Kb2F1dGg6IHsiZmFjZWJvb2siOiJodHRwczpcL1wvd3d3LmZhY2Vib29rLmNvbVwvZGlhbG9nXC9vYXV0aD9zY29wZT1lbWFpbCZyZXNwb25zZV90eXBlPWNvZGUmZGlzcGxheT1wb3B1cCZjbGllbnRfaWQ9MTcyODk0MjQzMDY1MDQ4NiZyZWRpcmVjdF91cmk9aHR0cCUzQSUyRiUyRnBldHJvbGljaW91cy5jb20lMkZpbmRleC5waHAlM0ZvcHRpb24lM0Rjb21faW1wcm92ZWRfYWpheF9sb2dpbiUyNnRhc2slM0RmYWNlYm9vayIsImdvb2dsZSI6Imh0dHBzOlwvXC9hY2NvdW50cy5nb29nbGUuY29tXC9vXC9vYXV0aDJcL2F1dGg/c2NvcGU9aHR0cHMlM0ElMkYlMkZ3d3cuZ29vZ2xlYXBpcy5jb20lMkZhdXRoJTJGdXNlcmluZm8uZW1haWwraHR0cHMlM0ElMkYlMkZ3d3cuZ29vZ2xlYXBpcy5jb20lMkZhdXRoJTJGdXNlcmluZm8ucHJvZmlsZSZyZXNwb25zZV90eXBlPWNvZGUmZGlzcGxheT1wb3B1cCZjbGllbnRfaWQ9ODQ5NDk3NjQ3ODUzLW1mOThqNGdlOGkwYzlkaTFrbG9zc2YxbmdibWI2cG12LmFwcHMuZ29vZ2xldXNlcmNvbnRlbnQuY29tJnJlZGlyZWN0X3VyaT1odHRwJTNBJTJGJTJGcGV0cm9saWNpb3VzLmNvbSUyRmluZGV4LnBocCUzRm9wdGlvbiUzRGNvbV9pbXByb3ZlZF9hamF4X2xvZ2luJTI2dGFzayUzRGdvb2dsZSJ9LA0KYmdPcGFjaXR5OiAwLjQsDQpyZXR1cm5Vcmw6ICcvaXMtdGhpcy1kdXRjaC1jbGFzc2ljLWZpbmFsbHktYXMtY29vbC1hcy1hLWJtdycsDQpib3JkZXI6IHBhcnNlSW50KCdmNWY1ZjV8KnwzfCp8YzRjNGM0fCp8Nycuc3BsaXQoJ3wqfCcpWzFdKSwNCnBhZGRpbmc6IDQsDQp1c2VBSkFYOiAwLA0Kb3BlbkV2ZW50OiAnb25jbGljaycsDQp3bmRDZW50ZXI6IDAsDQpyZWdQb3B1cDogMSwNCmR1cjogMzAwLA0KdGltZW91dDogMCwNCmJhc2U6ICcvJywNCnRoZW1lOiAncGV0cm9saWNpb3VzJywNCnNvY2lhbFByb2ZpbGU6ICcnLA0Kc29jaWFsVHlwZTogJ2J0bkljbycsDQpjc3NQYXRoOiAnL21vZHVsZXMvbW9kX2ltcHJvdmVkX2FqYXhfbG9naW4vY2FjaGUvMTU3LzNkNDE4Mzk2NDk2N2Y2ZWVlYjI5MTdhOTI2OGM2MTIxLmNzcycsDQpyZWdQYWdlOiAnam9vbWxhJywNCmNhcHRjaGE6ICcnLA0Kc2hvd0hpbnQ6IDAsDQpnZW9sb2NhdGlvbjogZmFsc2UsDQp3aW5kb3dBbmltOiAnJw0KfSl9KTs=" type="text/javascript"></script>
gets reported outside the head (in html.p) as:
;(window.jq183||jQuery)(function($){new ImprovedAJAXLogin({ id: 157, isGuest: 1, oauth: {"facebook":"https:\/\/www.facebook.com\/dialog\/oauth?scope=email&response_type=code&display=popup&client_id=1728942430650486&redirect_uri=http%3A%2F%2Fpetrolicious.com%2Findex.php%3Foption%3Dcom_improved_ajax_login%26task%3Dfacebook","google":"https:\/\/accounts.google.com\/o\/oauth2\/auth?scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.profile&response_type=code&display=popup&client_id=849497647853-mf98j4ge8i0c9di1klossf1ngbmb6pmv.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Fpetrolicious.com%2Findex.php%3Foption%3Dcom_improved_ajax_login%26task%3Dgoogle"}, bgOpacity: 0.4, returnUrl: '/is-this-dutch-classic-finally-as-cool-as-a-bmw', border: parseInt('f5f5f5|*|3|*|c4c4c4|*|7'.split('|*|')[1]), padding: 4, useAJAX: 0, openEvent: 'onclick', wndCenter: 0, regPopup: 1, dur: 300, timeout: 0, base: '/', theme: 'petrolicious', socialProfile: '', socialType: 'btnIco', cssPath: '/modules/mod_improved_ajax_login/cache/157/3d4183964967f6eeeb2917a9268c6121.css', regPage: 'joomla', captcha: '', showHint: 0, geolocation: false, windowAnim: '' })});