Squeeze: sometimes, bind times out (backgrounded) at boot time
Dear Srs,
I have a bunch of squeeze boxes running with nis and autofs. All are working well, no performance issues. However, at boot time, sporadically, bind times out, and the machine goes up without nis. Since home folders are NFS via autofs, the machine becames useless, and a reboot is required (I know that restarting nis and autofs,* will solve it, but that requires root access). Is there any way to increase the timeout of bind at boot time? Joao |
Squeeze: sometimes, bind times out (backgrounded) at boot time
Joao Roscoe wrote:
> I have a bunch of squeeze boxes running with nis and autofs. All are working > well, no performance issues. However, at boot time, sporadically, bind times > out, and the machine goes up without nis. Your words say "bind times out" and "nis" fails but what does bind have to do with nis? When NIS/YP was written it was written for systems that did not use BIND nor even have it installed. In a pure NIS/YP system they used NIS/YP for host name resolution. NIS by itself does not depend upon BIND. There isn't an intrinsic dependency of one upon the other unless you have created one in your configuration. > Since home folders are NFS via autofs, the machine becames useless, > and a reboot is required (I know that restarting nis and autofs, > will solve it, but that requires root access). This reads to me that you have an NIS problem not a BIND problem. Probably your BIND configuration is okay. Instead look for your problem in your NIS configuration. > Is there any way to increase the timeout of bind at boot time? First find the root cause of the problem. It seems unlikely that it is BIND. What do you have in your /etc/yp.conf file? Are you specifying to find the nis server by broadcast, by IP address, or by server name? Note that the default Debian yp.conf file contains this following warning: # IMPORTANT: For the "ypserver", use IP addresses, or make sure that # the host is in /etc/hosts. This file is only interpreted # once, and if DNS isn't reachable yet the ypserver cannot # be resolved and ypbind won't ever bind to the server. It seems likely to me that you have placed host names in that file but failed to heed the warning and place the host names in your /etc/hosts. But keeping host names in /etc/hosts isn't wonderful. Neither is using IP addresses. I recommend avoiding names there and using the broadcast protocol to find the nis servers. domain example.com broadcast That would allow a client to associate with any of the nis master and slaves as they become available. Bob |
Squeeze: sometimes, bind times out (backgrounded) at boot time
Ok, I really mixed things up. I'm sorry (and I'm also very sorry for
the *huge* delayu answering to this thread). I meant that **ypbind** fails to bind to ypserver. And yes, the NIS domain servers are specified in yp.conf by their fully qualified names, and those names are hardcoded in /etc/hosts file. Also, /etc/nsswitch.conf has hosts line as below: hosts: files nis mdns4_minimal [NOTFOUND=return] dns mdns4 So, the ypbind should get the correct IPs for the servers immediately. But what I see, in practice is: most times, the machine(s) gose up properly. Other times, I see a timeout notice at boot ("...backgrounded"), and the system comes up unable to mount the remote users' "home" directories. When that happens, normally rebooting several times doesn't solve enything. Restarting nis and autofs, in this order *does* solve the issue. Best regards, Joao Roscoe On Thu, Sep 22, 2011 at 8:43 PM, Bob Proulx <bob@proulx.com> wrote: > Joao Roscoe wrote: >> I have a bunch of squeeze boxes running with nis and autofs. All are working >> well, no performance issues. However, at boot time, sporadically, bind times >> out, and the machine goes up without nis. > > Your words say "bind times out" and "nis" fails but what does bind > have to do with nis? *When NIS/YP was written it was written for > systems that did not use BIND nor even have it installed. *In a pure > NIS/YP system they used NIS/YP for host name resolution. *NIS by > itself does not depend upon BIND. *There isn't an intrinsic dependency > of one upon the other unless you have created one in your configuration. > >> Since home folders are NFS via autofs, the machine becames useless, >> and a reboot is required (I know that restarting nis and autofs, >> will solve it, but that requires root access). > > This reads to me that you have an NIS problem not a BIND problem. > Probably your BIND configuration is okay. *Instead look for your > problem in your NIS configuration. > >> Is there any way to increase the timeout of bind at boot time? > > First find the root cause of the problem. *It seems unlikely that it > is BIND. > > What do you have in your /etc/yp.conf file? *Are you specifying to > find the nis server by broadcast, by IP address, or by server name? > > Note that the default Debian yp.conf file contains this following warning: > > *# IMPORTANT: * *For the "ypserver", use IP addresses, or make sure that > *# * * * * * * * the host is in /etc/hosts. This file is only interpreted > *# * * * * * * * once, and if DNS isn't reachable yet the ypserver cannot > *# * * * * * * * be resolved and ypbind won't ever bind to the server. > > It seems likely to me that you have placed host names in that file but > failed to heed the warning and place the host names in your > /etc/hosts. *But keeping host names in /etc/hosts isn't wonderful. > Neither is using IP addresses. *I recommend avoiding names there and > using the broadcast protocol to find the nis servers. > > *domain example.com broadcast > > That would allow a client to associate with any of the nis master and > slaves as they become available. > > Bob > -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: CAKaijp0Ai_04eZLY8ifBWNYs-yQPaGJwVQ0MvG4tbjhFo3mcsA@mail.gmail.com">http://lists.debian.org/CAKaijp0Ai_04eZLY8ifBWNYs-yQPaGJwVQ0MvG4tbjhFo3mcsA@mail.gmail.com |
Squeeze: sometimes, bind times out (backgrounded) at boot time
Ok, I really mixed things up. I'm sorry (and I'm also very sorry for
the *huge* delay in answering to this thread). I meant that **ypbind** fails to bind to ypserver. And yes, the NIS domain servers are specified in yp.conf by their fully qualified names, and those names are hardcoded in /etc/hosts file. Also, /etc/nsswitch.conf has hosts line as below: hosts: files nis mdns4_minimal [NOTFOUND=return] dns mdns4 So, the ypbind should get the correct IPs for the servers immediately. But what I see, in practice is: most times, the machine(s) gose up properly. Other times, I see a timeout notice at boot ("...backgrounded"), and the system comes up unable to mount the remote users' "home" directories. When that happens, normally rebooting several times doesn't solve enything. Restarting nis and autofs, in this order *does* solve the issue. Best regards, Joao Roscoe PS. This is the second time I send this message, In the first time, I got an weird automatic response, something about "Case 80324" (googled for that, it's somehing about a bug in php4 package). Hope that was not me doing something really wrong. Double-checked the "To" address content, just in case. > On Thu, Sep 22, 2011 at 8:43 PM, Bob Proulx <bob@proulx.com> wrote: >> Joao Roscoe wrote: >>> I have a bunch of squeeze boxes running with nis and autofs. All are working >>> well, no performance issues. However, at boot time, sporadically, bind times >>> out, and the machine goes up without nis. >> >> Your words say "bind times out" and "nis" fails but what does bind >> have to do with nis? When NIS/YP was written it was written for >> systems that did not use BIND nor even have it installed. In a pure >> NIS/YP system they used NIS/YP for host name resolution. NIS by >> itself does not depend upon BIND. There isn't an intrinsic dependency >> of one upon the other unless you have created one in your configuration. >> >>> Since home folders are NFS via autofs, the machine becames useless, >>> and a reboot is required (I know that restarting nis and autofs, >>> will solve it, but that requires root access). >> >> This reads to me that you have an NIS problem not a BIND problem. >> Probably your BIND configuration is okay. Instead look for your >> problem in your NIS configuration. >> >>> Is there any way to increase the timeout of bind at boot time? >> >> First find the root cause of the problem. It seems unlikely that it >> is BIND. >> >> What do you have in your /etc/yp.conf file? Are you specifying to >> find the nis server by broadcast, by IP address, or by server name? >> >> Note that the default Debian yp.conf file contains this following warning: >> >> # IMPORTANT: For the "ypserver", use IP addresses, or make sure that >> # the host is in /etc/hosts. This file is only interpreted >> # once, and if DNS isn't reachable yet the ypserver cannot >> # be resolved and ypbind won't ever bind to the server. >> >> It seems likely to me that you have placed host names in that file but >> failed to heed the warning and place the host names in your >> /etc/hosts. But keeping host names in /etc/hosts isn't wonderful. >> Neither is using IP addresses. I recommend avoiding names there and >> using the broadcast protocol to find the nis servers. >> >> domain example.com broadcast >> >> That would allow a client to associate with any of the nis master and >> slaves as they become available. >> >> Bob >> -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: http://lists.debian.org/CAKaijp0=_WSNnmMTpxoJ2t4Af2ocwPvL7KLQwFQwSX8B9MHDs w@mail.gmail.com |
Squeeze: sometimes, bind times out (backgrounded) at boot time
Joao Roscoe wrote:
> Ok, I really mixed things up. I'm sorry (and I'm also very sorry for > the *huge* delay in answering to this thread). There was quite a long delay in that message! But what is a year among friends? :-) > I meant that **ypbind** fails to bind to ypserver. A critical difference. Thanks for clarifying that. > And yes, the NIS domain servers are specified in yp.conf by their > fully qualified names, and those names are hardcoded in /etc/hosts > file. Seems reasonable. I still use the broadcast protocol instead. But what you are doing is supposed to work okay and I can only assume that it does. > Also, /etc/nsswitch.conf has hosts line as below: > > hosts: files nis mdns4_minimal [NOTFOUND=return] dns mdns4 The contents there tell me that you have one of the zero-conf packages installed, libnss-mdns IIRC or possibly avahi, and that inserts those mdns entries into that file. I have had inconsistent behavior in that configuration. Some systems behave fine with the mdns configuration. But others have really odd and problematic DNS lookup behavior. I haven't concluded to root cause other than to say that if libnss-mdns is removed (or the nsswitch.conf file modified / cleaned) then the problems stop. And so when I run into the problem the easy solution is to remove libnss-mdns or clean nsswitch.conf to make the problem stop. In either case, I use the following configuration line for hosts in /etc/nsswitch.conf. hosts: files dns You might try it that way and test your error case again. > So, the ypbind should get the correct IPs for the servers immediately. > But what I see, in practice is: most times, the machine(s) gose up > properly. Other times, I see a timeout notice at boot > ("...backgrounded"), and the system comes up unable to mount the > remote users' "home" directories. When that happens, normally > rebooting several times doesn't solve enything. I would try the simplified nsswitch.conf hosts line configuration and see if it improves things for you. > Restarting nis and autofs, in this order *does* solve the issue. If forcing the start order in a different works then that sounds like some incorrectly specified dependency in the /etc/init.d/* scripts. > PS. This is the second time I send this message, In the first time, I > got an weird automatic > response, something about "Case 80324" (googled for that, it's > somehing about a bug in php4 package). > Hope that was not me doing something really wrong. Double-checked the > "To" address content, just in case. We all got your first message. The problem you saw was not with the mailing list. The problem is that for unknown reasons bad people sometimes subscribe the mailing list to addresses that go to automated bug trackers or forward to RT accounts or generate vacation replies or all manor of bad behavior. Why? I don't know. Why does anyone do bad things? When they do these things then every message that anyone sends to the mailing list generates a backscatter spam from that bad place. As soon as these are noticed the listmasters remove the offending address. If you can debug these quickly then it helps the listmaster to report them. But because it is so annoying before too long someone will have debugged it and gotten the offenders removed from the mailing list. Bob |
Squeeze: sometimes, bind times out (backgrounded) at boot time
> There was quite a long delay in that message! But what is a year
> among friends? :-) Thanks for your patience :-) > Seems reasonable. I still use the broadcast protocol instead. But > what you are doing is supposed to work okay and I can only assume that > it does. Tried the broadcast protocol. Unfortunately, no deal :-( I have around 20 boxes here. All of them were built as images from a reference machine, which received a clean squeeze install. For each machine, the image was dumped (with partimage), the hostname was changed, and the file /etc/udev/rules.d/70-persistent-net.rules was removed. So, all of them should behave the same way. However, some of them boot ok most of the times, others present NIS serve bind timeout everytime. Quite confusing... > In either case, I use the following configuration line for hosts in > /etc/nsswitch.conf. Tried that also. No improvement. In fact, I started getting some DNS trouble with a few older hosts. Looks like our DNS infrastructure is completely messed up Now, what really puzzles me: as I told before, "Restarting nis and autofs, in this order *does* solve the issue", and that's quite fast! Why doesn't it work at boot time? > I...sounds like > some incorrectly specified dependency in the /etc/init.d/* scripts. I agree with you, but I took a look at the scripts, and they look fine - autofs seems to depend on nis (I'm afraid I don't know this new init scheme very well, however). Anyway, this kind of issue would probably break things for a lot of people... > But because it is so annoying before too long someone > will have debugged it and gotten the offenders removed from the > mailing list. Got a probe email a few days ago - someone worked on it. Hope the issue is already solved. Best regards, Joćo -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: http://lists.debian.org/CAKaijp3KeG4vT9UtGzwaco9ivmVB+wq=skiXMn2m3nbc6QHP7 w@mail.gmail.com |
Squeeze: sometimes, bind times out (backgrounded) at boot time
Joao Roscoe wrote:
> > Seems reasonable. I still use the broadcast protocol instead. But > > what you are doing is supposed to work okay and I can only assume that > > it does. > > Tried the broadcast protocol. Unfortunately, no deal :-( Don't know. Works for me. I like it since that way any of the servers may be down/up and the client will bind to any of them. That combination gives a nice bit of failover redundancy. (Shrug.) > I have around 20 boxes here. All of them were built as images from a > reference machine, which received a clean squeeze install. > For each machine, the image was dumped (with partimage), the hostname > was changed, and the file /etc/udev/rules.d/70-persistent-net.rules > was removed. Seems reasonable. I do a little bit more than that but mostly things specific to what I have installed. Such as configuring Postfix for the new hostname and so forth. Both /etc/hostname and /etc/mailname get updated. I assign static addresses and therefore /etc/network/interfaces is updated. I use a single ssh server key among the collective because they are intended to be identical. So I ensure that /etc/ssh/ssh_host_*_key* files are updated appropriately. And I think that is sufficient. > So, all of them should behave the same way. However, some > of them boot ok most of the times, others present NIS serve bind > timeout everytime. Quite confusing... If the hardware isn't completely identical then it is reasonable to have differences in the parallel boot timings. With the new parallel boot there will be forks and joins of the process flow during boot time. IIRC it is implemented using 'make -jX' to achieve parallel operation when possible. And since the behavior is new there are bound to be bugs that will affect people using it out of the mainstream paths. Using it with NIS/YP is not so common so I think it not unlikely that there is a bug related to it there. In particular I think I have seen cases, unverified, that even though an init.d script completed that the service it started wasn't yet ready to serve. For example I am pretty sure I have seen problems with bind starting up and being ready to serve immediately. Can't confirm this though. But it seems suspicious given your symptoms. Or nis starting up may be similar. > > In either case, I use the following configuration line for hosts in > > /etc/nsswitch.conf. > > Tried that also. No improvement. In fact, I started getting some DNS > trouble with a few older hosts. Looks like our DNS infrastructure is > completely messed up That seems like a completely separate issue. Probably should separate the two problems and address each one individually. Would be happy to help with the DNS configuration too. Describe how it is set up and the list could provide feedback on how to improve it. DNS is a marvelously designed distributed database system. It isn't perfect. There are a few problems. They didn't think of everything when it was designed. It is a huge improvement over the previous system. But it is only as good as the configured network around it. > Now, what really puzzles me: as I told before, "Restarting nis and > autofs, in this order *does* solve the issue", and that's quite fast! > Why doesn't it work at boot time? Try this experiment. At the last point in the /etc/init.d/nis startup script add a short sleep. That will give the daemons time to finish and get ready to go. It is possible that they are not yet quite ready yet and so immediately after the end of the script the next one to run hits them too early. I suggest changing this in file /etc/init.d/nis: case "$1" in start) do_start ;; stop) To this as an experiment: case "$1" in start) do_start sleep 5 # <-- Add this sleep to give things more time. ;; stop) I would do the same thing for /etc/init.d/bind9 too. Then see if that resolves the problem. I am not proposing this as a full solution nor even saying that must be the problem. But I would definitely try it as an experiment to gain data and characterize the problem. And if it works then that might be a good enough workaround for you until the problem really is resolved. (Or it might be the 'allow-hotplug' described below.) > > I...sounds like > > some incorrectly specified dependency in the /etc/init.d/* scripts. > > I agree with you, but I took a look at the scripts, and they look fine > - autofs seems to depend on nis (I'm afraid I don't know this new init > scheme very well, however). Traditionally Sun systems would store automount maps in nis files making them available through nis/yp to client machines such as through 'ypcat -k auto.master' and other files. The autofs startup script obtains the configuration files this way dynamically at start time. This is optional. It isn't required. You may have configured it either using real files on disk or using files in networked nis/yp files. If in the nis/yp files then the autofs script will try to use them from nis. > Anyway, this kind of issue would probably > break things for a lot of people... I have something else to try that I have learned in the last year since your first note. :-) In /etc/network/interfaces it probably says: allow-hotplug eth0 Change that to: auto eth0 The allow-hotplug enables the event driven startup. The auto enables the traditional startup. I have had some issues with the event driven startup similar where things will block for a long time at boot time waiting for various events to happen. Using auto instead forces the previously hard set flow and avoids the problem. Specifically when using nfs mounts in /etc/fstab. Again as an experiment I would switch to 'auto' for the network startup. That by itself might be your solution. (Or it might be the startup sleep delay described above.) > > But because it is so annoying before too long someone > > will have debugged it and gotten the offenders removed from the > > mailing list. > > Got a probe email a few days ago - someone worked on it. Hope the > issue is already solved. Unfortunately the problem persists. I conversed briefly with the listmasters and they are aware of it but no one has been able to deduce the offender. The joe1assistly spam has also affected some of the Cygwin mailing lists too. I have examined the spam coming my direction and I can't deduce a clear solution to it. Of course I could block it for myself by blocking any Message-Id: with joegiglio.org in it but that wouldn't help the mailing list at large. Bob |
| All times are GMT. The time now is 10:56 AM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.