|
SA Bugzilla – Full Text Bug Listing |
Summary: | [review] DnsResolver.pm/connect_sock() fails on IPv6 resolver address | ||
---|---|---|---|
Product: | Spamassassin | Reporter: | Mark Martinec <Mark.Martinec> |
Component: | Libraries | Assignee: | SpamAssassin Developer Mailing List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P5 | ||
Version: | SVN Trunk (Latest Devel Version) | ||
Target Milestone: | 3.1.0 | ||
Hardware: | PC | ||
OS: | All | ||
Whiteboard: | ready to commit | ||
Attachments: |
IPV6 support
update of 3074 rev3 better error handling and work around IO::Socket::INET6 family quirk merge of 3085 and 3087 redo of 3088 |
Description
Mark Martinec
2005-06-20 18:07:22 UTC
I've been working on DnsResolver, but I'm not familiar with IPv6. Can you point me to some documentation? What does the "::1" mean in the nameservers array? How should it be used to create a socket? Sidney -- iirc, "::1" is equivalent to "127.0.0.1" in IPv6 formatting. (there's a whole host of abbreviation formats in ipv6 btw.) there's a little doco on IPv6 hostname formatting in lib/Mail/SA/Constants.pm I think. ::1 is an IP address, the IPv6 address similar to the IPv4 address 127.0.0.1. See RFC 2373 sections 2.2.2 and 2.5.3. To create a socket, you should probably use IO::Socket::INET6 if that module is installed, falling back to IO::Socket::INET if not. I would recommend surrounding the nameserver's IP address with square brackets, passing that as the the PeerAddr argument to IO::Socket::INET(6)?::new. The call to sockaddr_in() I would replace with a call to the socket's peerhost() method. I looked at the code in sub send_udp in Net::DNS::Resolver::Base.pm Probably I would do it the way it is now done there in the latest version. I'll be working on finishing my final term paper of the semester for the next four or five days and will not get to this before that's finished. If anyone else wants to pick up on this, patches will be gratefully appreciated. Do make sure that you look at Net::DNS in case there is anything in there that is differnt from what you have in mind. The latest svn version of Net::DNS can be pulled down using svn co -r 386 http://www.net-dns.org/svn/net-dns/trunk btw this is NOT a mass-check blocker (afaik). *** Bug 4417 has been marked as a duplicate of this bug. *** should this block 3.1.0? It's now documented in the FAQ so imo no, but it'd be nice to fix. Created attachment 3074 [details]
IPV6 support
I'm not sure if this will fully fix this bug. One gotcha that I can pick out is
if the system is configured to use ipv6, has an ipv6 address in
/etc/resolv.conf but doesn't have IO::Socket::INET6 installed. Personally, I
don't really see the point, but it's possible, and this patch won't help in
that situation.
It's been a while since I hacked Perl, so be gentle!! :)
BTW, I have spamd listening on ::1 on my system, and spamc talking to it, and
all seems to be working ok. I've to clean up the patches, but I'll upload them
in the next few days.
Created attachment 3078 [details]
update of 3074
thanks John! I've made a couple of changes to that with this rev:
- used constants instead of a variable to indicate module presence,
that's more "the SA way"
- added an entry to lib/Mail/SpamAssassin/Util/DependencyInfo.pm so
that it'll be flagged at Makefile.PL time
- added entry about the module to INSTALL
- added some code to produce an explanatory warning for the situation where
someone's using ipv6 systemwide, has an ipv6 addr in resolv.conf, but
doesn't have IO::Socket::INET6 installed.
The latter situation now looks like this:
: jm 116...; RES_NAMESERVERS=::1 ./spamassassin.raw -D --lint
[31283] dbg: logger: adding facilities: all
[31283] dbg: logger: logging level is DBG
[31283] dbg: generic: SpamAssassin version 3.2.0-r232569
[31283] dbg: config: score set 0 chosen.
[31283] dbg: util: running in taint mode? no
[31283] dbg: dns: is Net::DNS::Resolver available? yes
[31283] dbg: dns: Net::DNS version: 0.48
IO::Socket::INET6 module is required to use IPv6 nameservers such as '::1':
IO::Socket::INET: Bad hostname '::1'
Note to reviewers: that die() call in _ipv6_ns_warning() is required instead of
a warn() -- without that, it'll loop, producing the warning infinitely until
CTRL-C'd.
I think this is reviewable for 3.1.0. I'd be OK with leaving it
for 3.1.1, though.
This should be tested with Net::DNS version > 0.48. Version fifty-something added support for IPv6 that may give this different behavior. I don't know that for sure, but I want to raise the flag to make sure this gets reviewed using the latest Net::DNS. I'll try to get to that tonight. The patch passes make test on my system, but I don't understand what is going on with this part of the patch that removes two lines of code - $self->{dest} = sockaddr_in($self->{res}->{port}, - inet_aton($self->{res}->{nameservers}[0])); I see that $self->{dest} is used later in sub bgsend in the following snippet: my $data = $pkt->data; my $dest = $self->{dest}; $self->connect_sock() if !$self->{sock}; if (!$self->{sock}->send ($pkt->data, 0, $self->{dest})) { warn "dns: sendto() failed: $@"; return; } And then $data and $dest are never used, which is a separate issue. The code in Net::DNS::Resolver::Base.pm version 0.53 in sub send_udp that computes the equivalent of $self->{dest} here distinguishes between IPv6 and IPv4 doing something like the following to compute what it names $dst_sockaddr: In the IPv4 case it looks like $dst_sockaddr = sockaddr_in($dstport, inet_aton($ns_address)); In the IPv6 case it looks like # we can use getaddrinfo no strict 'subs'; # Because of the eval statement in the BEGIN # AI_NUMERICHOST is not available at compile time. # The AI_NUMERICHOST surpresses lookups. my $old_wflag = $^W; #circumvent perl -w warnings about 'udp' $^W = 0; my @res = getaddrinfo($ns_address, $dstport, AF_UNSPEC, SOCK_DGRAM, 0, AI_NUMERICHOST); $^W=$old_wflag ; use strict 'subs'; my ($sockfamily, $socktype_tmp, $proto_tmp, $dst_sockaddr, $canonname_tmp) = @res; So do we need equivalent code to compute $self->{dest}? Why were those lines just removed in the patch if $self->{dest} is later used in sub bgsend? I don't have time right now to figure out a test case to demonstrate bgsend breaking with this patch or to figure out why it doesn't break, but this doesn't look to me like it could be correct. Subject: Re: [review] DnsResolver.pm/connect_sock() fails on IPv6 resolver address > The patch passes make test on my system, but I don't understand what is going on > with this part of the patch that removes two lines of code > > - $self->{dest} = sockaddr_in($self->{res}->{port}, > - inet_aton($self->{res}->{nameservers}[0])); The OP mentioned that inet_aton as coded was incompatable with IPV6. I suspect that may be related to why the code was removed. Perhaps the correct solution is similar to the code you posted for IPV6, and a test to decide which method to use. Yes, the code that was removed is incompatible with IPv6, but it does look like it is needed for bgsend and so cannot simply be removed. It would be nice to figure out a test case that demonstrates what removing it breaks, as make test does work with this patch, at least on my IPv4 only system. Without understanding this well enough to show such a test case I'm wary of just putting in some code that looks like it should work that is based on the snippet from Net::DNS that I posted. btw I'd really prefer not to use the code from Net::DNS::Resolver -- look at it, it's terrifying! let's just use IO::Socket::INET6 and keep it simple. having said that, yes, I meant to fix the bgsend case and it looks like I forgot. it's harmless -- IO::Socket::INET6 and ::INET do the right thing afaik -- but it's unclear. basically, with UDP sockets you have two choices: 1. specify the dest endpoint in the IO::Socket::INET constructor, or 2. specify it at send() time that works for IPv4, and #2 is what we were using in Resolver. However, for IPv6 with IO::Socket::INET6, only #1 works, #2 seems to be no longer supported. therefore we have to change our code to use the #1 approach instead of #2. Since we only ever used 1 endpoint anyway, this is no biggie for us anyway. Created attachment 3085 [details]
rev3
easy fix; just remove the optional 'dest' arg from the send() call.
I'm glad we can get away with code that is so much simpler than what Net::DNS has to do. I see that my $data = $pkt->data; still seems to be unnecessary, with $data never being referenced. Can it be removed as part of the patch? I have a question about something I noticed while checking over the code for the patch which exceeds my knowledge of perl. In sub connect_sock there is a line $self->{sock}->close() if $self->{sock}; In sub finish_socket I see if ($self->{sock}) { $self->{sock}->close(); delete $self->{sock}; } Is the delete necessary in connect_sock too? Should connect_sock just call finish_socket there instead of {sock}->close? Subject: Re: [review] DnsResolver.pm/connect_sock() fails on IPv6 resolver address -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > I see that my $data = $pkt->data; still seems to be unnecessary, with $data > never being referenced. Can it be removed as part of the patch? yeah, trivial, we can do that at checkin ;) > I have a question about something I noticed while checking over the code > for the patch which exceeds my knowledge of perl. In sub connect_sock > there is a line > $self->{sock}->close() if $self->{sock}; > > In sub finish_socket I see > > if ($self->{sock}) { > $self->{sock}->close(); > delete $self->{sock}; > } > > Is the delete necessary in connect_sock too? No, it's not necessary -- when $self->{sock} is set later on in the method, it GC's the old $self->{sock} ref first, in other words doing what the delete does, implicitly. It'd possibly be nice to make it explicit, but really, no big deal. > Should connect_sock just call > finish_socket there instead of {sock}->close? It could do -- I'd go either way. On one hand, it'd be nice to share the code; OTOH, I've often run into situations where a public "close" API needs to do slightly different things compared to a "open, closing if already open first" API. I'm fine leaving it the way it is. - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFDA8YyMJF5cimLx9ARAj6+AKCVrF9Hqs1U0tWLeIYc+hs2Ywb8lACfRuu8 dKXZWhbQHsbx4CuVw/ZoMUM= =AWmh -----END PGP SIGNATURE----- +1, with the caveat that I don't have a system running IPv6 to test it with. Runs fine on my tests with just IPv4. All seems ok with my tests using IPv6 here. Created attachment 3087 [details]
better error handling and work around IO::Socket::INET6 family quirk
The patch applies on top of Justin's rev3 patch and irons out
some remaining IPv4/IPv6 issues in DnsResolver.pm (it didn't work
out of the box):
- properly report error status on unexpected failures to create socket;
(previously undef $socket went on undetected and caused subsequent
failure elsewhere);
- avoid 65000 attempts on other than EADDRINUSE failure;
- work around a quirk in IO::Socket::INET6 where a creation of a
socket fails because LocalAddr is auto-asssigned but happens to
belong to other address family; author has been notified; either
specifying Domain explicitly or providing LocalAddr of the correct
family explicitly avoids the problem - I chose to provide the family;
Now it works in my environment, with or without IO::Socket::INET6
(missing module properly reported when appropriate),
with IPv4 or IPv6 resolver address.
(you may want to adjust the coding style)
Mark
Mark, thanks for the corrections. Can you please submit a CLA ( http://www.apache.org/licenses/#clas ) to make it easier to use your patch? Also, what failures did you see that you addressed with the change "- avoid 65000 attempts on other than EADDRINUSE failure;" And how does your patch avoid that? I only see that it checks for EIADDRINUSE and if that's not the error, it calls _ipv6_ns_warning which only dies if the error is a problem with IPv6 nameserver. I originally did not check for EIADDRINUSE because I thought that if it is something else and it works on trying again on another port, then why not just keep on trying? If there is some common failure that we would want to die immediately on, then we can check for that. Justin, aside from CLA issues, this is simple enough that you can rewrite Mark's patch from scratch and may want to just because of style issues, perhaps wanting something other than die in case of error, etc. The changes I see in the patch are: 1) Add a Domain argument to the IO::Socket constructor call set to: If PeerAddr is in IPv6 family and we have IO::Socket::INET6 set it to AF_INET6, else set it to AF_INET 2) In the loop that looks for an unused port, skip the call to _ipv6_ns_warning if $! is EIADDRINUSE because that's the error we intend to loop over. I must have hit a control key by accident when typing my last comment. I didn't intend to mark this bug fixed. Reopening. actually, Mark has a CLA on-file -- updating the bz list to match that. > Also, what failures did you see that you addressed with the change > "- avoid 65000 attempts on other than EADDRINUSE failure;" Sorry for being too sketchy... > I originally did not check for EIADDRINUSE because I thought that > if it is something else and it works on trying again on another port, > then why not just keep on trying? If there is some common failure that > we would want to die immediately on, then we can check for that. That is what I had in mind. The reasoning is that if something unexpected happens, it is better to be reported as soon as possible and with as much information as possible (the $!), instead of being possibly masked by further attempts. An imaginary scenario to be avoided is that some day someone would report that a DNS connect takes enormous amount of time or allocates much system resources, only to be discovered that it was due to original problem not being noticed and reported. The background story is what happened when I initially tried that code with rev3 patch (I had IPv4 address in resolv.conf, and the module IO::Socket::INET6 was installed): the 'make test' of SA was very slow and failing: PERL_DL_NONLAZY=1 /usr/local/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.tt/basic_lint .... Can't use an undefined value as a symbol reference at ../blib/lib/Mail/SpamAssassin/DnsResolver.pm line 428. It turned out the "Can't use an undefined value as a symbol reference" was caused by fileno(undef), because the @fhlist in sub fhs_to_vec had one element which was undef. Tracking this back brought me to sub connect_sock, where the IO::Socket::INET6->new was failing with EINVAL because of the wrong address family implicitly chosen. Unfortunately it didn't report the failure, and wasted the whole round of 64k attempts to not do anything useful but return undef in $sock. WHICH MADE ME FIX THE ERROR REPORTING FIRST. Ignoring status, or not reporting the full error status is always bad, at least in the long run, when one has to deal with hordes of users, and remote troubleshooting is very hard to do. > And how does your patch avoid that? By calling die with $! in the message. > I only see that it checks for EIADDRINUSE and if that's not the error, > it calls _ipv6_ns_warning which only dies if the error is a problem > with IPv6 nameserver ... AND THEN if _ipv6_ns_warning does not report and dies by itself, then the: die "Error creating a DNS resolver socket: $errno"; does this instead. > The changes I see in the patch are: > > 1) Add a Domain argument to the IO::Socket constructor call set to: > If PeerAddr is in IPv6 family and we have IO::Socket::INET6 set it > to AF_INET6, else set it to AF_INET True. Perhaps the Domain attribute can even be left unspecified if HAS_SOCKET_INET6 is false (just in case if comeone does not even have the AF_INET6 constant). > 2) In the loop that looks for an unused port, skip the call to > _ipv6_ns_warning if $! is EIADDRINUSE because that's the error > we intend to loop over. That was secondary. The main point is in reporting the $! when IO::Socket::INET*->new fails and the $! is not some expected status like EIADDRINUSE, which we know how to deal with. > The main point is in reporting the $! when
> IO::Socket::INET*->new fails and the $! is not some
> expected status like EIADDRINUSE
That makes sense. WHat I was trying to avoid in not checking for EIADDRINUSE is
having it all fail due to some other relatively benign error that is like
EIADDRINUSE.
Which is more correct would depend on whether you are confident that anything
other than EIADDRINUSE and the INET6 problem is a hard error that will persist
across further attempts on different ports, or whether you aren't sure if there
might be such temporary errors other than EIADDRINUSE.
Since I don't know much abo9ut the various error status codes when trying to
open a socket, I chose the latter. However, the difficulty you had in debugging
this certainly demonstrates the benefits of your approach when an unforseen hard
error does happen.
I'll leave the decision on which way to go with this to Justin or other
committers who want to express an opinion about it.
I initially sided with Sidney, but Mark's statements all make sense to me. We have often had a bit of a thing for premature workarounds; going out of our way to fix stuff that we don't know is actually going to break. So let's go with the improved error reporting, and only later see if there's a problem left that we need to worry about... One thing I'm worried about btw -- the use of AF_INET6 as a bare word. There's a possibility that some platform may not have that defined -- but I cannot think what we can do about that without tying ourselves in knots. Again, let's see if there are any reports of that being a problem, before trying to fix it. (I've left in an explanatory comment just in case.) I'll repackage that patch as against SVN trunk and let's get some votes. I'll also check it into trunk so buildbot can see how we do on the AF_INET6 bareword issue. Created attachment 3088 [details]
merge of 3085 and 3087
ok, patch 3088 is in trunk as r233398. the drawing board calls. ;) http://spamassassin.zones.apache.org:8010/t-red-hat-73/builds/20/compile/0 http://spamassassin.zones.apache.org:8010/t-sol10-561/builds/23/compile/0 The error was: version.h.pl: version.h.pl: version.h.pl: version.h.pl: Bareword "AF_INET6" not allowed while "strict subs" in use at ../lib/Mail/SpamAssassin/DnsResolver.pm line 163. BEGIN not safe after errors--compilation aborted at ../lib/Mail/SpamAssassin/DnsResolver.pm line 476. Compilation failed in require at ../lib/Mail/SpamAssassin.pm line 75. BEGIN failed--compilation aborted at ../lib/Mail/SpamAssassin.pm line 75. I believe Net::DNS went through some iterations of fixes for AF_INET6 bare word problem. See if you can make sense of the code around the use of AF_INET6 in Net::DNS::Resolver::Base.pm If I remember correctly, there was some conditional expression that had to be put in place everywhere that AF_INET6 appeared. Now I remember. Every use of AF_INET6 was replaced with AF_INET6() Does that help? The reason for doing that exceeds my current knowledge of perl. it helps a little -- I think trial and error will be required. Subject: Re: [review] DnsResolver.pm/connect_sock() fails on IPv6 resolver address -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On (18/08/05 17:16), bugzilla-daemon@bugzilla.spamassassin.org didst pronounce: > it helps a little -- I think trial and error will be required. > Could you not redefine it if IO::Socket::INET6 isn't available? if(!HAS_SOCKET_INET6) { use constant AF_INET6 => 0; } Or something like that? AF_INET6 depends on HAS_SOCKET_INET6 being set, and if it's not set, AF_INET6 is never needed. - -- Chat ya later, John. - -- BOFH excuse #159: Stubborn processes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFDBapcQBw+ZtKOvTIRAlJXAJ9cF+66In9/MaiYzFTx0oxnGviTpwCeLdf3 pTePW0lK+Fsvgq5B6fBtpLQ= =uGLf -----END PGP SIGNATURE----- easiest thing that was most likely to work, was to use eval 'AF_INET6'. that's now in svn. Created attachment 3090 [details] redo of 3088 OK, this looks a lot more promising -- the buildbots are chugging away without a failure so far: http://spamassassin.zones.apache.org:8010/ this patch is from svn trunk from those 2 checkins, and should apply to b3_1_0. assuming the Buildbots pass, please vote... +1 (btw to follow on from my last comment: all tests passed, the patch works. and it still needs 1 more vote ;) +1 jm's on vacation, so I'll apply this shortly Committed revision 261907. I'm adding a commment here to document a change related to commment 24 that I just checked in as revision 325836 for bug 4619 in case a problem leads back to this bug report. That change contains a different fix for this bug's comment 24 that does not require using the AF_INET6 macro and therefore doesn't neeed the various tests for AF_INET6 being defined. The problem brought up in comment 24 is that on some platforms creating a socket using AF_UNSPEC doesn't work if the source and destination addresses are in different domains, with the symptom being an EINVAL error on the send. Instead of replacing AF_UNSPEC with AF_INET6 or AF_INET, as was done here, the new fix replaces the LocalAddr with "::" or "0.0.0.0" to match the PeerAddr family. |