Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Debian User (http://www.linux-archive.org/debian-user/)
-   -   Squeeze: sometimes, bind times out (backgrounded) at boot time (http://www.linux-archive.org/debian-user/578810-squeeze-sometimes-bind-times-out-backgrounded-boot-time.html)

Joao Roscoe 09-22-2011 02:13 PM

Squeeze: sometimes, bind times out (backgrounded) at boot time
 
Dear Srs,

I have a bunch of squeeze boxes running with nis and autofs. All are working well, no performance issues. However, at boot time, sporadically, bind times out, and the machine goes up without nis. Since home folders are NFS via autofs, the machine becames useless, and a reboot is required (I know that restarting nis and autofs,* will solve it, but that requires root access).


Is there any way to increase the timeout of bind at boot time?

Joao

Bob Proulx 09-22-2011 11:43 PM

Squeeze: sometimes, bind times out (backgrounded) at boot time
 
Joao Roscoe wrote:
> I have a bunch of squeeze boxes running with nis and autofs. All are working
> well, no performance issues. However, at boot time, sporadically, bind times
> out, and the machine goes up without nis.

Your words say "bind times out" and "nis" fails but what does bind
have to do with nis? When NIS/YP was written it was written for
systems that did not use BIND nor even have it installed. In a pure
NIS/YP system they used NIS/YP for host name resolution. NIS by
itself does not depend upon BIND. There isn't an intrinsic dependency
of one upon the other unless you have created one in your configuration.

> Since home folders are NFS via autofs, the machine becames useless,
> and a reboot is required (I know that restarting nis and autofs,
> will solve it, but that requires root access).

This reads to me that you have an NIS problem not a BIND problem.
Probably your BIND configuration is okay. Instead look for your
problem in your NIS configuration.

> Is there any way to increase the timeout of bind at boot time?

First find the root cause of the problem. It seems unlikely that it
is BIND.

What do you have in your /etc/yp.conf file? Are you specifying to
find the nis server by broadcast, by IP address, or by server name?

Note that the default Debian yp.conf file contains this following warning:

# IMPORTANT: For the "ypserver", use IP addresses, or make sure that
# the host is in /etc/hosts. This file is only interpreted
# once, and if DNS isn't reachable yet the ypserver cannot
# be resolved and ypbind won't ever bind to the server.

It seems likely to me that you have placed host names in that file but
failed to heed the warning and place the host names in your
/etc/hosts. But keeping host names in /etc/hosts isn't wonderful.
Neither is using IP addresses. I recommend avoiding names there and
using the broadcast protocol to find the nis servers.

domain example.com broadcast

That would allow a client to associate with any of the nis master and
slaves as they become available.

Bob

Joao Roscoe 05-07-2012 03:44 PM

Squeeze: sometimes, bind times out (backgrounded) at boot time
 
Ok, I really mixed things up. I'm sorry (and I'm also very sorry for
the *huge* delayu answering to this thread).

I meant that **ypbind** fails to bind to ypserver.
And yes, the NIS domain servers are specified in yp.conf by their
fully qualified names, and those names are hardcoded in /etc/hosts
file. Also, /etc/nsswitch.conf has hosts line as below:

hosts: files nis mdns4_minimal [NOTFOUND=return] dns mdns4

So, the ypbind should get the correct IPs for the servers immediately.
But what I see, in practice is: most times, the machine(s) gose up
properly. Other times, I see a timeout notice at boot
("...backgrounded"), and the system comes up unable to mount the
remote users' "home" directories. When that happens, normally
rebooting several times doesn't solve enything. Restarting nis and
autofs, in this order *does* solve the issue.

Best regards,
Joao Roscoe


On Thu, Sep 22, 2011 at 8:43 PM, Bob Proulx <bob@proulx.com> wrote:
> Joao Roscoe wrote:
>> I have a bunch of squeeze boxes running with nis and autofs. All are working
>> well, no performance issues. However, at boot time, sporadically, bind times
>> out, and the machine goes up without nis.
>
> Your words say "bind times out" and "nis" fails but what does bind
> have to do with nis? *When NIS/YP was written it was written for
> systems that did not use BIND nor even have it installed. *In a pure
> NIS/YP system they used NIS/YP for host name resolution. *NIS by
> itself does not depend upon BIND. *There isn't an intrinsic dependency
> of one upon the other unless you have created one in your configuration.
>
>> Since home folders are NFS via autofs, the machine becames useless,
>> and a reboot is required (I know that restarting nis and autofs,
>> will solve it, but that requires root access).
>
> This reads to me that you have an NIS problem not a BIND problem.
> Probably your BIND configuration is okay. *Instead look for your
> problem in your NIS configuration.
>
>> Is there any way to increase the timeout of bind at boot time?
>
> First find the root cause of the problem. *It seems unlikely that it
> is BIND.
>
> What do you have in your /etc/yp.conf file? *Are you specifying to
> find the nis server by broadcast, by IP address, or by server name?
>
> Note that the default Debian yp.conf file contains this following warning:
>
> *# IMPORTANT: * *For the "ypserver", use IP addresses, or make sure that
> *# * * * * * * * the host is in /etc/hosts. This file is only interpreted
> *# * * * * * * * once, and if DNS isn't reachable yet the ypserver cannot
> *# * * * * * * * be resolved and ypbind won't ever bind to the server.
>
> It seems likely to me that you have placed host names in that file but
> failed to heed the warning and place the host names in your
> /etc/hosts. *But keeping host names in /etc/hosts isn't wonderful.
> Neither is using IP addresses. *I recommend avoiding names there and
> using the broadcast protocol to find the nis servers.
>
> *domain example.com broadcast
>
> That would allow a client to associate with any of the nis master and
> slaves as they become available.
>
> Bob
>


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: CAKaijp0Ai_04eZLY8ifBWNYs-yQPaGJwVQ0MvG4tbjhFo3mcsA@mail.gmail.com">http://lists.debian.org/CAKaijp0Ai_04eZLY8ifBWNYs-yQPaGJwVQ0MvG4tbjhFo3mcsA@mail.gmail.com

Joao Roscoe 08-08-2012 07:43 PM

Squeeze: sometimes, bind times out (backgrounded) at boot time
 
Ok, I really mixed things up. I'm sorry (and I'm also very sorry for
the *huge* delay in answering to this thread).

I meant that **ypbind** fails to bind to ypserver.
And yes, the NIS domain servers are specified in yp.conf by their
fully qualified names, and those names are hardcoded in /etc/hosts
file. Also, /etc/nsswitch.conf has hosts line as below:

hosts: files nis mdns4_minimal [NOTFOUND=return] dns mdns4

So, the ypbind should get the correct IPs for the servers immediately.
But what I see, in practice is: most times, the machine(s) gose up
properly. Other times, I see a timeout notice at boot
("...backgrounded"), and the system comes up unable to mount the
remote users' "home" directories. When that happens, normally
rebooting several times doesn't solve enything. Restarting nis and
autofs, in this order *does* solve the issue.

Best regards,
Joao Roscoe

PS. This is the second time I send this message, In the first time, I
got an weird automatic
response, something about "Case 80324" (googled for that, it's
somehing about a bug in php4 package).
Hope that was not me doing something really wrong. Double-checked the
"To" address content, just in case.


> On Thu, Sep 22, 2011 at 8:43 PM, Bob Proulx <bob@proulx.com> wrote:
>> Joao Roscoe wrote:
>>> I have a bunch of squeeze boxes running with nis and autofs. All are working
>>> well, no performance issues. However, at boot time, sporadically, bind times
>>> out, and the machine goes up without nis.
>>
>> Your words say "bind times out" and "nis" fails but what does bind
>> have to do with nis? When NIS/YP was written it was written for
>> systems that did not use BIND nor even have it installed. In a pure
>> NIS/YP system they used NIS/YP for host name resolution. NIS by
>> itself does not depend upon BIND. There isn't an intrinsic dependency
>> of one upon the other unless you have created one in your configuration.
>>
>>> Since home folders are NFS via autofs, the machine becames useless,
>>> and a reboot is required (I know that restarting nis and autofs,
>>> will solve it, but that requires root access).
>>
>> This reads to me that you have an NIS problem not a BIND problem.
>> Probably your BIND configuration is okay. Instead look for your
>> problem in your NIS configuration.
>>
>>> Is there any way to increase the timeout of bind at boot time?
>>
>> First find the root cause of the problem. It seems unlikely that it
>> is BIND.
>>
>> What do you have in your /etc/yp.conf file? Are you specifying to
>> find the nis server by broadcast, by IP address, or by server name?
>>
>> Note that the default Debian yp.conf file contains this following warning:
>>
>> # IMPORTANT: For the "ypserver", use IP addresses, or make sure that
>> # the host is in /etc/hosts. This file is only interpreted
>> # once, and if DNS isn't reachable yet the ypserver cannot
>> # be resolved and ypbind won't ever bind to the server.
>>
>> It seems likely to me that you have placed host names in that file but
>> failed to heed the warning and place the host names in your
>> /etc/hosts. But keeping host names in /etc/hosts isn't wonderful.
>> Neither is using IP addresses. I recommend avoiding names there and
>> using the broadcast protocol to find the nis servers.
>>
>> domain example.com broadcast
>>
>> That would allow a client to associate with any of the nis master and
>> slaves as they become available.
>>
>> Bob
>>


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/CAKaijp0=_WSNnmMTpxoJ2t4Af2ocwPvL7KLQwFQwSX8B9MHDs w@mail.gmail.com

Bob Proulx 08-09-2012 09:26 PM

Squeeze: sometimes, bind times out (backgrounded) at boot time
 
Joao Roscoe wrote:
> Ok, I really mixed things up. I'm sorry (and I'm also very sorry for
> the *huge* delay in answering to this thread).

There was quite a long delay in that message! But what is a year
among friends? :-)

> I meant that **ypbind** fails to bind to ypserver.

A critical difference. Thanks for clarifying that.

> And yes, the NIS domain servers are specified in yp.conf by their
> fully qualified names, and those names are hardcoded in /etc/hosts
> file.

Seems reasonable. I still use the broadcast protocol instead. But
what you are doing is supposed to work okay and I can only assume that
it does.

> Also, /etc/nsswitch.conf has hosts line as below:
>
> hosts: files nis mdns4_minimal [NOTFOUND=return] dns mdns4

The contents there tell me that you have one of the zero-conf packages
installed, libnss-mdns IIRC or possibly avahi, and that inserts those
mdns entries into that file. I have had inconsistent behavior in that
configuration. Some systems behave fine with the mdns configuration.
But others have really odd and problematic DNS lookup behavior. I
haven't concluded to root cause other than to say that if libnss-mdns
is removed (or the nsswitch.conf file modified / cleaned) then the
problems stop. And so when I run into the problem the easy solution
is to remove libnss-mdns or clean nsswitch.conf to make the problem
stop.

In either case, I use the following configuration line for hosts in
/etc/nsswitch.conf.

hosts: files dns

You might try it that way and test your error case again.

> So, the ypbind should get the correct IPs for the servers immediately.
> But what I see, in practice is: most times, the machine(s) gose up
> properly. Other times, I see a timeout notice at boot
> ("...backgrounded"), and the system comes up unable to mount the
> remote users' "home" directories. When that happens, normally
> rebooting several times doesn't solve enything.

I would try the simplified nsswitch.conf hosts line configuration and
see if it improves things for you.

> Restarting nis and autofs, in this order *does* solve the issue.

If forcing the start order in a different works then that sounds like
some incorrectly specified dependency in the /etc/init.d/* scripts.

> PS. This is the second time I send this message, In the first time, I
> got an weird automatic
> response, something about "Case 80324" (googled for that, it's
> somehing about a bug in php4 package).
> Hope that was not me doing something really wrong. Double-checked the
> "To" address content, just in case.

We all got your first message. The problem you saw was not with the
mailing list. The problem is that for unknown reasons bad people
sometimes subscribe the mailing list to addresses that go to automated
bug trackers or forward to RT accounts or generate vacation replies or
all manor of bad behavior. Why? I don't know. Why does anyone do
bad things?

When they do these things then every message that anyone sends to the
mailing list generates a backscatter spam from that bad place. As
soon as these are noticed the listmasters remove the offending
address. If you can debug these quickly then it helps the listmaster
to report them. But because it is so annoying before too long someone
will have debugged it and gotten the offenders removed from the
mailing list.

Bob

Joao Roscoe 08-15-2012 07:41 PM

Squeeze: sometimes, bind times out (backgrounded) at boot time
 
> There was quite a long delay in that message! But what is a year
> among friends? :-)

Thanks for your patience :-)

> Seems reasonable. I still use the broadcast protocol instead. But
> what you are doing is supposed to work okay and I can only assume that
> it does.

Tried the broadcast protocol. Unfortunately, no deal :-(
I have around 20 boxes here. All of them were built as images from a
reference machine, which received a clean squeeze install.
For each machine, the image was dumped (with partimage), the hostname
was changed, and the file /etc/udev/rules.d/70-persistent-net.rules
was removed. So, all of them should behave the same way. However, some
of them boot ok most of the times, others present NIS serve bind
timeout everytime. Quite confusing...

> In either case, I use the following configuration line for hosts in
> /etc/nsswitch.conf.

Tried that also. No improvement. In fact, I started getting some DNS
trouble with a few older hosts. Looks like our DNS infrastructure is
completely messed up

Now, what really puzzles me: as I told before, "Restarting nis and
autofs, in this order *does* solve the issue", and that's quite fast!
Why doesn't it work at boot time?

> I...sounds like
> some incorrectly specified dependency in the /etc/init.d/* scripts.

I agree with you, but I took a look at the scripts, and they look fine
- autofs seems to depend on nis (I'm afraid I don't know this new init
scheme very well, however). Anyway, this kind of issue would probably
break things for a lot of people...

> But because it is so annoying before too long someone
> will have debugged it and gotten the offenders removed from the
> mailing list.

Got a probe email a few days ago - someone worked on it. Hope the
issue is already solved.

Best regards,
Joćo


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/CAKaijp3KeG4vT9UtGzwaco9ivmVB+wq=skiXMn2m3nbc6QHP7 w@mail.gmail.com

Bob Proulx 08-15-2012 09:03 PM

Squeeze: sometimes, bind times out (backgrounded) at boot time
 
Joao Roscoe wrote:
> > Seems reasonable. I still use the broadcast protocol instead. But
> > what you are doing is supposed to work okay and I can only assume that
> > it does.
>
> Tried the broadcast protocol. Unfortunately, no deal :-(

Don't know. Works for me. I like it since that way any of the
servers may be down/up and the client will bind to any of them. That
combination gives a nice bit of failover redundancy. (Shrug.)

> I have around 20 boxes here. All of them were built as images from a
> reference machine, which received a clean squeeze install.
> For each machine, the image was dumped (with partimage), the hostname
> was changed, and the file /etc/udev/rules.d/70-persistent-net.rules
> was removed.

Seems reasonable. I do a little bit more than that but mostly things
specific to what I have installed. Such as configuring Postfix for
the new hostname and so forth. Both /etc/hostname and /etc/mailname
get updated. I assign static addresses and therefore
/etc/network/interfaces is updated. I use a single ssh server key
among the collective because they are intended to be identical. So I
ensure that /etc/ssh/ssh_host_*_key* files are updated appropriately.
And I think that is sufficient.

> So, all of them should behave the same way. However, some
> of them boot ok most of the times, others present NIS serve bind
> timeout everytime. Quite confusing...

If the hardware isn't completely identical then it is reasonable to
have differences in the parallel boot timings. With the new parallel
boot there will be forks and joins of the process flow during boot
time. IIRC it is implemented using 'make -jX' to achieve parallel
operation when possible. And since the behavior is new there are
bound to be bugs that will affect people using it out of the
mainstream paths. Using it with NIS/YP is not so common so I think it
not unlikely that there is a bug related to it there.

In particular I think I have seen cases, unverified, that even though
an init.d script completed that the service it started wasn't yet
ready to serve. For example I am pretty sure I have seen problems
with bind starting up and being ready to serve immediately. Can't
confirm this though. But it seems suspicious given your symptoms.
Or nis starting up may be similar.

> > In either case, I use the following configuration line for hosts in
> > /etc/nsswitch.conf.
>
> Tried that also. No improvement. In fact, I started getting some DNS
> trouble with a few older hosts. Looks like our DNS infrastructure is
> completely messed up

That seems like a completely separate issue. Probably should separate
the two problems and address each one individually. Would be happy to
help with the DNS configuration too. Describe how it is set up and
the list could provide feedback on how to improve it.

DNS is a marvelously designed distributed database system. It isn't
perfect. There are a few problems. They didn't think of everything
when it was designed. It is a huge improvement over the previous
system. But it is only as good as the configured network around it.

> Now, what really puzzles me: as I told before, "Restarting nis and
> autofs, in this order *does* solve the issue", and that's quite fast!
> Why doesn't it work at boot time?

Try this experiment. At the last point in the /etc/init.d/nis startup
script add a short sleep. That will give the daemons time to finish
and get ready to go. It is possible that they are not yet quite ready
yet and so immediately after the end of the script the next one to run
hits them too early.

I suggest changing this in file /etc/init.d/nis:

case "$1" in
start)
do_start
;;
stop)

To this as an experiment:

case "$1" in
start)
do_start
sleep 5 # <-- Add this sleep to give things more time.
;;
stop)

I would do the same thing for /etc/init.d/bind9 too. Then see if that
resolves the problem. I am not proposing this as a full solution nor
even saying that must be the problem. But I would definitely try it
as an experiment to gain data and characterize the problem. And if it
works then that might be a good enough workaround for you until the
problem really is resolved. (Or it might be the 'allow-hotplug'
described below.)

> > I...sounds like
> > some incorrectly specified dependency in the /etc/init.d/* scripts.
>
> I agree with you, but I took a look at the scripts, and they look fine
> - autofs seems to depend on nis (I'm afraid I don't know this new init
> scheme very well, however).

Traditionally Sun systems would store automount maps in nis files
making them available through nis/yp to client machines such as
through 'ypcat -k auto.master' and other files. The autofs startup
script obtains the configuration files this way dynamically at start
time. This is optional. It isn't required. You may have configured
it either using real files on disk or using files in networked nis/yp
files. If in the nis/yp files then the autofs script will try to use
them from nis.

> Anyway, this kind of issue would probably
> break things for a lot of people...

I have something else to try that I have learned in the last year
since your first note. :-)

In /etc/network/interfaces it probably says:

allow-hotplug eth0

Change that to:

auto eth0

The allow-hotplug enables the event driven startup. The auto enables
the traditional startup. I have had some issues with the event driven
startup similar where things will block for a long time at boot time
waiting for various events to happen. Using auto instead forces the
previously hard set flow and avoids the problem. Specifically when
using nfs mounts in /etc/fstab. Again as an experiment I would switch
to 'auto' for the network startup. That by itself might be your
solution. (Or it might be the startup sleep delay described above.)

> > But because it is so annoying before too long someone
> > will have debugged it and gotten the offenders removed from the
> > mailing list.
>
> Got a probe email a few days ago - someone worked on it. Hope the
> issue is already solved.

Unfortunately the problem persists. I conversed briefly with the
listmasters and they are aware of it but no one has been able to
deduce the offender. The joe1assistly spam has also affected some of
the Cygwin mailing lists too. I have examined the spam coming my
direction and I can't deduce a clear solution to it. Of course I
could block it for myself by blocking any Message-Id: with
joegiglio.org in it but that wouldn't help the mailing list at large.

Bob


All times are GMT. The time now is 08:09 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.