Bug 31858 - regular expression matching broken on amd64
regular expression matching broken on amd64
Status: RESOLVED FIXED
Product: Apache httpd-1.3
Classification: Unclassified
Component: Auth/Access
1.3.31
Other other
: P3 major with 3 votes (vote)
: ---
Assigned To: Apache HTTPD Bugs Mailing List
: PatchAvailable
: 32067 33478 34010 34172 35151 (view as bug list)
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2004-10-22 23:42 UTC by Alex Krohn
Modified: 2007-08-02 14:28 UTC (History)
7 users (show)



Attachments
amd64 safety for regex code (1.77 KB, patch)
2004-11-05 06:56 UTC, Glenn Strauss
Details | Diff
amd64 safety for regex code (take 2) (1.74 KB, patch)
2004-11-05 07:09 UTC, Glenn Strauss
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Krohn 2004-10-22 23:42:25 UTC
On Apache 1.3.31 on amd64 running linux 2.6.8.1, adding:

<FilesMatch "\.(gif|jpg|mp3|css|js|png)$">
     Options All -Indexes
     AllowOverride All
     deny from all
</FilesMatch>

is causing Apache to match files with any extension.

To reproduce, I used a stock 1.3.31 and did:

./configure --prefix=/tmp/apache
make
make install

add the above conf to the end of the default httpd.conf. Then point your browser to:

http://localhost/index.html

which will give a 403 forbidden. I also get a forbidden for:

http://localhost/index.html
http://localhost/index.htm
http://localhost/index.ht
http://localhost/index.h
http://localhost/index.

but I get a 404 for:

http://localhost/index

If I remove the FilesMatch, then index.html shows up as expected. Also, oddly,
if I change the filesmatch to:

<FilesMatch "\.(gif|jpg|mp3|css|js)$">

everything also works as expected. 

This only happens on our amd64 systems. x86 works fine.

If you need any more info, please let me know.

Thanks!
Comment 1 Scott Beck 2004-11-02 08:16:23 UTC
I've been looking into this issue tonight. Havn't uncovered much except it's not
an issue with the system's regex library. I created this simple test which is
pretty much what apache seems to be trying to do:

#include <stdio.h>
#include <sys/types.h>
#include <regex.h>

int main(int argc, char *argv[])
{
    int ret;
    regex_t *r;
    const char *regex="\\.(gif|jpg|mp3|css|js|png)$";
    int flags = REG_EXTENDED;

    r = malloc(sizeof(regex_t));
    if (regcomp(r, regex, flags)) {
        printf("Error compiling regex\n");
        return 1;
    }
    ret = regexec(r, "index.html", 0, NULL, 0);
    printf("match: %d\n", ret == 0);
    return 0;
}

This works as it should. I added some printf() debugging into
src/main/http_core.c looks like this:

  printf("compiled '%s' into %p at %d\n", cmd->path, r, __LINE__);

That's at line 1760 after it compiles the regex. I also added a printf() in
src/main/http_request.c:

  printf("regex match on %s [%p]\n", test_file, entry_core->r);

on line 704 before it does the ap_regexec() call and a couple of prints to see
if it matches or not.

All of this is with a download of apache_1.3.31 non-patched, I compiled it
myself with the only options being --prefix=/tmp/test_apache. I didn't change
the conf file except to remove the other match on .htaccess and set the port to
8080. Here is the output I get:

gossamer test_apache # ./bin/httpd -X
compiled '\.(gif|jpg|mp3|css|js|png)$' into 0x58ae98 at 1760
compiled '\.(gif|jpg|mp3|css|js|png)$' into 0x58dd18 at 1760

Then when I make a request for / on that server:

regex match on htdocs [0x58dd18]
no match!
regex match on index.html [0x58dd18]
match!
regex match on favicon.ico [0x58dd18]
match!

As you can see index.html is matching the precompiled regex that it shouldn't.
I'm going to continue to investigate this. If anyone has any suggestions I'd
love to hear them.

Cheers,

Scott
Comment 2 Joe Orton 2004-11-02 09:06:44 UTC
I notice the hsregex test suite segfaults in on amd64 whereas it passes on i386,
so that might be a good place to start looking: cd src/regex && make r
Comment 3 Scott Beck 2004-11-03 19:06:24 UTC
Thanks Joe. I've been messing with that code for the last day and found a
solution. The version of the engine.c functions for "small" op regexs does not
work with 64 bit integers. A simple fix is to change regexec.c line 137 from:
if (g->nstates <= CHAR_BIT*sizeof(states1) && !(eflags&REG_LARGE))
to:
if (g->nstates <= CHAR_BIT*4 && !(eflags&REG_LARGE))

On an opteron sizeof(states1) is always 8 but the smatcher function, for some
unknown reason, will not handle a regex with more than 32 states. A better fix
would be to find the reason why smatcher will not work with more than 32 states
on a opteron but this fix is fine for me.

Cheers,

Scott
Comment 4 André Malo 2004-11-04 21:56:40 UTC
*** Bug 32067 has been marked as a duplicate of this bug. ***
Comment 5 Joe Orton 2004-11-04 22:18:06 UTC
Very weird that this code survives how ever many years then two people hit it in
a week, it's not like people don't use 1.3 on 64-bit platforms already... the
only thing different is that amd64 is little-endian I suppose.

But that change does look like a workaround than a fix.  From googling around, 

http://www.mysql-websource.com/mysql4020/source-regexec.htm

is intesting it looks like MySQL have made some 64-bit-cleanliness changes, e.g.

#define onestate long /* Changed from int by Monty */

but that alone doesn't fix the segfaults.
Comment 6 Glenn Strauss 2004-11-05 06:56:17 UTC
Created attachment 13336 [details]
amd64 safety for regex code
Comment 7 Glenn Strauss 2004-11-05 07:09:02 UTC
Created attachment 13337 [details]
amd64 safety for regex code (take 2)
Comment 8 Glenn Strauss 2004-11-05 07:15:58 UTC
The second patch above is a bit more pedantic than the first since the return
value of ISSETBACK() is stored into an (int) and used as a boolean in the step()
function in src/regex/engine.c.  Patch passes tests in src/regex (`make r`) on a
dual Opteron.
Comment 9 Joe Orton 2005-02-10 10:24:22 UTC
*** Bug 33478 has been marked as a duplicate of this bug. ***
Comment 10 Joe Stump 2005-03-07 07:44:36 UTC
Something is still wrong here. I applied the patch and recompiled and was quite
delighted to see some of my regexp's come back online, however, some of them are
still showing signs of problems. 

RewriteRule ^/guides/([^/]*)/Titles/([A-Z]).html$
/jax/index.php/enotes/lookup/type=$1/sort=Titles/letter=$2 [L]

The above rule loads the page, but continues to show about 2% left on the
progress bar in Mozilla (while apache goes nuts in top). Restarting apache fixes
the problem (until you go to that URL again). However, other rules work fine,
like the following:

    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule ^/([a-z0-9|-]+)/s([0-9]+)$
/jax/index.php/enotes/works/notes=$1/sectionID=$2

    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule ^/([a-z0-9|-]+)/?([a-z0-9]+)?/?([a-z|-]+)?/?$
/jax/index.php/enotes/works/notes=$1/id=$2/tail=$3 [L]

Oddly enough, these work despite the first rewrite rule being placed above these
working ones. The problem only arises on a match of the rewrite rule.

--Joe
Comment 11 Joe Stump 2005-03-08 23:48:00 UTC
Nevermind my early reply. I figured out that the issue is resolved in Apache
2.0.x on my AMD64 box. The problem was I was running a patched 1.3.33 Apache
with PHP5 and my PHP code isn't ready for PHP5 yet. 

At any rate the patch appears to work fine.

Comment 12 Joe Orton 2005-03-19 19:03:23 UTC
*** Bug 34010 has been marked as a duplicate of this bug. ***
Comment 13 Joe Orton 2005-03-25 14:35:23 UTC
*** Bug 34172 has been marked as a duplicate of this bug. ***
Comment 14 Paul Querna 2005-03-27 23:06:16 UTC
This is being patched by downstream vendors like Gentoo:
http://bugs.gentoo.org/show_bug.cgi?id=70177 

The 2nd patch seems to work for them.  Consider committing it?
Comment 15 Pete Harlan 2005-05-20 21:01:35 UTC
Apache consistently hangs (apparently in a tight loop) on our Linux 2.6 amd64
machines without this patch (rewrite rules, I'm guessing); with it everything
appears fine.
Comment 16 Joe Orton 2005-06-01 11:50:30 UTC
*** Bug 35151 has been marked as a duplicate of this bug. ***