this might be a bit brief as it's really late and i'm already in
trouble :-/ ... will expand as needed.
i upgrade my machines and VMs very regularly, at least once a week,
this last batch of updates broke all of my VMs in particular ...
hardware devices still seem to function correctly. they are not
really broken, but i am unable to regenerate `/etc/ld.so.cache` on any
of them (3 ATM).
... the locale stuff is just a side effect of the ldconfig failure
IIRC -- locales are a little borked because the archive file was blown
away, not a problem tho. for some reason ldconfig refuses to update
on all the VMs (each has worked without issue until today, for several
months):
# ldconfig
Aborted
... i tried blacklisting some virtio modules (balloon in particular)
and tripling memory, no change, and not convinced its 100% related to
virtio yet. i tried removing `/var/cache/ldconfig/aux-cache` to force
ldconfig to rescan everything (vs. stat checks) -- again, works on
hardware but not VM. i tried removing the last library it processes
before failing (per strace), no change, it just fails on another
(libgcrypt -> libsysfs). i tried reinstalling glibc and whatnot ...
these VMs are all pure 64bit, no multilib, and pure systemd, original
initscript stuff purged. the only thing i didnt try was downgrading,
because i would have to use the ARM ... i use 9p2000.L passthru for
the rootfs of each VM, bindmount a local mirror into each VM's VFS,
then configure pacman to use the `pool` directory of the bound mirror
as a cachedir -- the net effect is pacman never downloads anything,
ever, because it believes it already has :-) slightly odd perhaps,
but working well for quite some time.
any ideas? i can't find anything out of place, or any significant
differences, and i'm not sure what to try next -- nothing unusual in
dmesg or logs, on the VMs or the host. host is completely current as
of Dec 14 00:00 CST. reduce strace follows, limited to files and
signals. thanks for your time so far, if you made it to this point
legitimately :-)
and my machine is upgraded using those repositories.
Did you get the bug by using only those repositories?
Regards,
Ralf
12-14-2011, 11:01 AM
C Anthony Risinger
ldconfig -> Aborted.
On Wed, Dec 14, 2011 at 5:46 AM, C Anthony Risinger <anthony@xtfx.me> wrote:
>
> any ideas? *i can't find anything out of place, or any significant
> differences, and i'm not sure what to try next -- nothing unusual in
> dmesg or logs, on the VMs or the host. *host is completely current as
> of Dec 14 00:00 CST. *reduce strace follows, limited to files and
> signals.
at the last second i looked at the locale-gen stuff again, the trace
shows mmap() failing with EINVAL:
... i'm thinking it's probably related to 9p2000.L passthru at this
point (ehm, under KVM if i didn't already mention it), but if anyone
has some additional input, or better debug commands (eg. strace) that
would be awesome. ldconfig does *not* fail with any errors at all, or
trigger any whatsoever (other than ENOENT for missing files/etc).
i might have created one of these from scratch on 9p2000.L, but i
think they were all rsync'ed from existing installs on LVM partitions
(as i was conviting my setup to use passthru for many benefits) ...
it's possible this is the first time glibc/locale-gen has been ran
since the conversion.
--
C Anthony
12-14-2011, 03:56 PM
Leonid Isaev
ldconfig -> Aborted.
On Wed, 14 Dec 2011 06:01:37 -0600
C Anthony Risinger <anthony@xtfx.me> wrote:
> On Wed, Dec 14, 2011 at 5:46 AM, C Anthony Risinger <anthony@xtfx.me> wrote:
>
> at the last second i looked at the locale-gen stuff again, the trace
> shows mmap() failing with EINVAL:
>
> # strace -ff -s256 -etrace=mmap localedef -i en_US -c -f ISO-8859-1 -A
> /usr/share/locale/locale.alias en_US
>
> ......
> mmap(NULL, 536870912, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
> 0x7fb3aac63000
> mmap(0x7fb3aac63000, 103860, PROT_READ|PROT_WRITE,
> MAP_SHARED|MAP_FIXED, 3, 0) = -1 EINVAL (Invalid argument)
> cannot map archive header: Invalid argument
>
> ... i'm thinking it's probably related to 9p2000.L passthru at this
> point (ehm, under KVM if i didn't already mention it), but if anyone
> has some additional input, or better debug commands (eg. strace) that
> would be awesome. ldconfig does *not* fail with any errors at all, or
> trigger any whatsoever (other than ENOENT for missing files/etc).
>
> i might have created one of these from scratch on 9p2000.L, but i
> think they were all rsync'ed from existing installs on LVM partitions
> (as i was conviting my setup to use passthru for many benefits) ...
> it's possible this is the first time glibc/locale-gen has been ran
> since the conversion.
>
> --
>
> C Anthony
On Wed, Dec 14, 2011 at 10:56 AM, Leonid Isaev <lisaev@umail.iu.edu> wrote:
> On Wed, 14 Dec 2011 06:01:37 -0600
> C Anthony Risinger <anthony@xtfx.me> wrote:
>>
>> ... i'm thinking it's probably related to 9p2000.L passthru at this
>> point (ehm, under KVM if i didn't already mention it), but if anyone
>> has some additional input, or better debug commands (eg. strace) that
>> would be awesome. *ldconfig does *not* fail with any errors at all, or
>> trigger any whatsoever (other than ENOENT for missing files/etc).
>>
>> i might have created one of these from scratch on 9p2000.L, but i
>> think they were all rsync'ed from existing installs on LVM partitions
>> (as i was conviting my setup to use passthru for many benefits) ...
>> it's possible this is the first time glibc/locale-gen has been ran
>> since the conversion.
>
> Erm, have you actually tried to run ldconfig -v?
heh ... uh, no. no i didn't. i guess my mind skipped right to the
heavy artillery.
# ldconfig -v
ldconfig: Can't stat /usr/lib64: No such file or directory
/usr/lib/libfakeroot:
libfakeroot-0.so -> libfakeroot.so
/usr/lib/perl5/core_perl/CORE:
libperl.so -> libperl.so
/lib:
Aborted
... nothing useful i'm afraid :-(
--
C Anthony
12-14-2011, 08:41 PM
Leonid Isaev
ldconfig -> Aborted.
On Wed, 14 Dec 2011 14:56:25 -0600
C Anthony Risinger <anthony@xtfx.me> wrote:
> On Wed, Dec 14, 2011 at 10:56 AM, Leonid Isaev <lisaev@umail.iu.edu> wrote:
> > On Wed, 14 Dec 2011 06:01:37 -0600
> > C Anthony Risinger <anthony@xtfx.me> wrote:
> >>
> >> ... i'm thinking it's probably related to 9p2000.L passthru at this
> >> point (ehm, under KVM if i didn't already mention it), but if anyone
> >> has some additional input, or better debug commands (eg. strace) that
> >> would be awesome. *ldconfig does *not* fail with any errors at all, or
> >> trigger any whatsoever (other than ENOENT for missing files/etc).
> >>
> >> i might have created one of these from scratch on 9p2000.L, but i
> >> think they were all rsync'ed from existing installs on LVM partitions
> >> (as i was conviting my setup to use passthru for many benefits) ...
> >> it's possible this is the first time glibc/locale-gen has been ran
> >> since the conversion.
> >
> > Erm, have you actually tried to run ldconfig -v?
>
> heh ... uh, no. no i didn't. i guess my mind skipped right to the
> heavy artillery.
>
> # ldconfig -v
> ldconfig: Can't stat /usr/lib64: No such file or directory
> /usr/lib/libfakeroot:
> libfakeroot-0.so -> libfakeroot.so
> /usr/lib/perl5/core_perl/CORE:
> libperl.so -> libperl.so
> /lib:
> Aborted
>
> ... nothing useful i'm afraid :-(
>
So it basically receives SIGABRT. Have you already run strace on ldconfig, or
only locale-gen? If not try this and also try removing /var/cache/ldconfig...
> # ldconfig -v
> ldconfig: Can't stat /usr/lib64: No such file or directory
> /usr/lib/libfakeroot:
> libfakeroot-0.so -> libfakeroot.so
> /usr/lib/perl5/core_perl/CORE:
> libperl.so -> libperl.so
> /lib:
> Aborted
I think there's no harm in "mkdir /usr/lib64".
To me this sounds as if the VM balloons out of memory. How much RAM is
allocated to the VM's?
clemens
12-27-2011, 08:35 AM
C Anthony Risinger
ldconfig -> Aborted.
On Thu, Dec 22, 2011 at 4:01 PM, clemens fischer
<ino-news@spotteswoode.dnsalias.org> wrote:
>
>> # ldconfig -v
>> ldconfig: Can't stat /usr/lib64: No such file or directory
>> /usr/lib/libfakeroot:
>> * * * *libfakeroot-0.so -> libfakeroot.so
>> /usr/lib/perl5/core_perl/CORE:
>> * * * *libperl.so -> libperl.so
>> /lib:
>> Aborted
>
> I think there's no harm in "mkdir /usr/lib64".
>
> To me this sounds as if the VM balloons out of memory. *How much RAM is
> allocated to the VM's?
(fair amount of debug output ... summary at end)
yeah i originally tried upping the mem to 1024M+, preventing the
balloon module from loading (since it's an opt-in kernel module), and
not even using mem ballooning -- no changes at all. the /usr/lib64
stuff isn't a prob, my guess is everyone's machine does that (/lib64
is created by glibc for compat reasons only, not in filesystem
package) ...
... though, after rebuilding glibc with debug syms, i was able to
trace the issue. `ldconfig` is consistently receiving the correct,
then incorrect(?) inode, twice(!), to an arbitrary library; `ldconfig`
detects this anomaly just before adding the entry to it's aux-cache,
then explicitly calls abort().
while the problem library is `libgcrypt.so.11`, it's not specific to
that lib (if i remove that library, it just fails on a different one)
... possibly a pattern here but not yet sure.
i ran `gdb --args ldconfig -v` (breakpoint, output, backtrace, and
source context provided below):
----------------------------------------------------------------------------
Reading symbols from /sbin/ldconfig...done.
(gdb) break cache.c:620 if (soname!=0x0 && strcmp(soname, "libgcrypt.so.11")==0)
Breakpoint 1 at 0x402e14: file cache.c, line 620.
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>silent
>printf "
---- soname: %s
---- inode: %i
---- hash: %i
", soname, id->ino, hash
>continue
>end
(gdb) run
Starting program: /sbin/ldconfig -v
/sbin/ldconfig: Can't stat /usr/lib64: No such file or directory
Program received signal SIGABRT, Aborted.
0x000000000044f4fc in raise (sig=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0 0x000000000044f4fc in raise (sig=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x000000000040c20e in abort () at abort.c:93
#2 0x0000000000402e57 in insert_to_aux_cache (id=0x7fffffffd1b0,
flags=771, osversion=0, soname=0x6db360 "libgcrypt.so.11", used=1) at
cache.c:625
#3 0x0000000000403dea in add_to_aux_cache (stat_buf=<optimized out>,
flags=<optimized out>, osversion=<optimized out>, soname=<optimized
out>) at cache.c:650
#4 0x00000000004023cd in search_dir (entry=0x6d09d0) at ldconfig.c:880
#5 0x0000000000402d09 in search_dirs () at ldconfig.c:1023
#6 main (argc=2, argv=<optimized out>) at ldconfig.c:1372
(gdb) list cache.c:620,625
620 for (entry = aux_hash[hash]; entry; entry = entry->next)
621 if (id->ino == entry->id.ino
622 && id->ctime == entry->id.ctime
623 && id->size == entry->id.size
624 && id->dev == entry->id.dev)
625 abort ();
----------------------------------------------------------------------------
... before adding a new entry to the cache, `ldconfig` loops thru
existing entries and aborts if an *exact* match is found ... and in
this case there appears to somehow be 2 entries to the same library
(with different inodes), the first is bogus (from VM perspective
anyway) and the second is added twice, triggering the abort.
i don't know if v9fs or QEMU is suppose to be changing the inode, but
every file i test is "off by two", example (host/VM, resp):
# stat --format="%i %n" ./lib/libgcrypt.so.11.7.0
15344348 ./lib/libgcrypt.so.11.7.0
# stat --format="%i %n" /lib/libgcrypt.so.11.7.0
15344350 /lib/libgcrypt.so.11.7.0
... `ldconfig` is attempting to add EACH as `libgcrypt.so.11.7.0` (see
gdb output)! very suspicious. the first (host?) version is somehow
detected before anything in ld.so.conf.d/* is tried (and gdb confirms
all other libs are found during this period as well) ...
... something is definitely wonky though, because `ldconfig` tries to
add inode `15344348` as `libgcrypt.so.11.7.0`, but that is totally
wrong from guest perspective:
stat --format="%i %n" /lib/l* | grep 15344348
15344348 /lib/libext2fs.so.2.4
... i don't know how the !@#$ it's getting that, but i suspect some
kind of bad interaction between the host/VM page caches, or a bug in
ldconfig, the v9fs kernel module, the "virtfs" server implemented
within QEMU, or possibly something *very* odd about my setup. i'm also
using the "mapped" virtfs option (guest perms/etc are stored in xattrs
on the host) allowing QEMU to run as nobody:kvm instead of root ...
could be part of the problem ... i thought this was the recommended
way, perhaps not.
in conclusion ... the issue is very unlikely to be Arch-specific. i'll
debug a bit more, and take the information to the proper sources, but
i figured i'd do a final update here for closure/interest, but will of
course still gladly accepted any further advice or suggestion.