FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian KDE

 
 
LinkBack Thread Tools
 
Old 03-28-2010, 04:30 PM
Michael Schuerig
 
Default Nepomuk: re-checking the strigi index constantly

I've upgraded to the fresh 4.4.2 packages and now Nepomuk apparently
finishes its indexing work. In my case, it stops at a (reasonable)
143.492 files. Nice.

As far as I can tell, Nepomuk periodically re-checks the index. It
blazes through the indexed folders, not taking much time per folder, but
overall, considering the number of indexed files, it takes quite a
while. Unfortunately, about as long as the checking interval. So, in
effect, Nepomuk appears to be re-checking its index most of the time, in
the process using more than 100% CPU (dual core) among virtuoso, strigi,
and other nepomukservices.

Apparently, a new
/usr/bin/nepomukservicestub nepomukstrigiservice
is started every 8 or 9 minutes. from ~/.xsession-errors I can't see
anything that indicates that these processes are crashing.

Michael

--
Michael Schuerig
mailto:michael@schuerig.de
http://www.schuerig.de/michael/


--
To UNSUBSCRIBE, email to debian-kde-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 201003281830.58099.michael@schuerig.de">http://lists.debian.org/201003281830.58099.michael@schuerig.de
 
Old 03-30-2010, 06:17 PM
Michael Schuerig
 
Default Nepomuk: re-checking the strigi index constantly

On Sunday 28 March 2010, Michael Schuerig wrote:

> Apparently, a new
> /usr/bin/nepomukservicestub nepomukstrigiservice
> is started every 8 or 9 minutes. from ~/.xsession-errors I can't see
> anything that indicates that these processes are crashing.

Well, strace and gdb know better. Indexing is interrupted, when an
assert statement fails and causes a SIGABRT

# strigi-0.7.1/src/streamanalyzer/lineeventanalyzer.cpp:180
void
LineEventAnalyzer::handleUtf8Data(const char* data, uint32_t length) {
assert(!(sawCarriageReturn && missingBytes > 0));

I haven't tried to understand the code intimately, but from looking
around a bit, I gather that this is to ensure that multi-byte characters
are complete when the end of line is reached. I take it that one of my
files is either containing broken UTF-8 or strigi mistakes it for UTF-8
when actually it isn't.

Now, I'm wondering, is this something I ought to report as a bug against
strigi or is the problem with Nepomuk for not logging abnormal
termination of child processes? Or is it pdftotext for apparently
producing invalid UTF-8 from a PDF (iconv doesn't complain about it,
though)?

Michael

--
Michael Schuerig
mailto:michael@schuerig.de
http://www.schuerig.de/michael/


--
To UNSUBSCRIBE, email to debian-kde-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 201003302017.16078.michael@schuerig.de">http://lists.debian.org/201003302017.16078.michael@schuerig.de
 
Old 03-31-2010, 07:51 AM
Carsten Pfeiffer
 
Default Nepomuk: re-checking the strigi index constantly

Am Dienstag, 30. März 2010 schrieb Michael Schuerig:

> Now, I'm wondering, is this something I ought to report as a bug against
> strigi or is the problem with Nepomuk for not logging abnormal
> termination of child processes? Or is it pdftotext for apparently
> producing invalid UTF-8 from a PDF (iconv doesn't complain about it,
> though)?

All of the above ;-)

I'd say that
- nepopmuk or strigi should notice that it crashed on a file and put it into
some blacklist until its mtime changes
- strigi should keep on indexing the other files instead of restarting
- pdftotext as the originator of the file ought to be fixed

Cheers,
Carsten
 
Old 03-31-2010, 09:53 AM
Michael Schuerig
 
Default Nepomuk: re-checking the strigi index constantly

On Wednesday 31 March 2010, Carsten Pfeiffer wrote:
> Am Dienstag, 30. März 2010 schrieb Michael Schuerig:
> > Now, I'm wondering, is this something I ought to report as a bug
> > against strigi or is the problem with Nepomuk for not logging
> > abnormal termination of child processes? Or is it pdftotext for
> > apparently producing invalid UTF-8 from a PDF (iconv doesn't
> > complain about it, though)?
>
> All of the above ;-)
>
> I'd say that
> - nepopmuk or strigi should notice that it crashed on a file and put
> it into some blacklist until its mtime changes
> - strigi should keep on indexing the other files instead of
> restarting - pdftotext as the originator of the file ought to be
> fixed

Done.

https://sourceforge.net/tracker/?func=detail&aid=2979889&group_id=171000&atid=8563 02
https://bugs.kde.org/show_bug.cgi?id=232814

I'm not completely certain that pdftotext really does anything wrong.

Michael

--
Michael Schuerig
mailto:michael@schuerig.de
http://www.schuerig.de/michael/


--
To UNSUBSCRIBE, email to debian-kde-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 201003311153.19221.michael@schuerig.de">http://lists.debian.org/201003311153.19221.michael@schuerig.de
 
Old 03-31-2010, 06:27 PM
Martin Steigerwald
 
Default Nepomuk: re-checking the strigi index constantly

Am Dienstag 30 März 2010 schrieb Michael Schuerig:
> On Sunday 28 March 2010, Michael Schuerig wrote:
> > Apparently, a new
> > /usr/bin/nepomukservicestub nepomukstrigiservice
> > is started every 8 or 9 minutes. from ~/.xsession-errors I can't see
> > anything that indicates that these processes are crashing.
>
> Well, strace and gdb know better. Indexing is interrupted, when an
> assert statement fails and causes a SIGABRT
>
> # strigi-0.7.1/src/streamanalyzer/lineeventanalyzer.cpp:180
> void
> LineEventAnalyzer::handleUtf8Data(const char* data, uint32_t length) {
> assert(!(sawCarriageReturn && missingBytes > 0));

That sounds simular to my bug report (see mail "nepomuk strigi service
crashed too often" in this list):

https://bugs.kde.org/232395

--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
 
Old 03-31-2010, 06:46 PM
Michael Schuerig
 
Default Nepomuk: re-checking the strigi index constantly

On Wednesday 31 March 2010, Martin Steigerwald wrote:
> Am Dienstag 30 März 2010 schrieb Michael Schuerig:
> > On Sunday 28 March 2010, Michael Schuerig wrote:
> > > Apparently, a new
> > > /usr/bin/nepomukservicestub nepomukstrigiservice
> > > is started every 8 or 9 minutes. from ~/.xsession-errors I can't
> > > see anything that indicates that these processes are crashing.
> >
> > Well, strace and gdb know better. Indexing is interrupted, when an
> > assert statement fails and causes a SIGABRT
> >
> > # strigi-0.7.1/src/streamanalyzer/lineeventanalyzer.cpp:180
> > void
> > LineEventAnalyzer::handleUtf8Data(const char* data, uint32_t
> > length) {
> >
> > assert(!(sawCarriageReturn && missingBytes > 0));
>
> That sounds simular to my bug report (see mail "nepomuk strigi
> service crashed too often" in this list):
>
> https://bugs.kde.org/232395

Similar, but probably not the exact same. In my case, there are no
indications in ~/.xsession-errors that anything went wrong. I've seen
messages of the "crashed to[o] often" kind for *other* problems, but not
for this specific one.

/usr/bin/nepomukservicestub and nepomukservices aren't very helpful as
the process name, have you tried looking at the complete commandline? In
top it's toggled by pressing 'c'.

AFAIUI, Soprano is part of the storage backend used by Nepomuk, whereas
Strigi belongs to the frontend retrieving the (meta)data. Your problem
appears to occurs, when Nepomuk tries to store data in Soprano. Then the
question is, where that data comes from and why it is malformed.

Michael

--
Michael Schuerig
mailto:michael@schuerig.de
http://www.schuerig.de/michael/


--
To UNSUBSCRIBE, email to debian-kde-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 201003312046.49115.michael@schuerig.de">http://lists.debian.org/201003312046.49115.michael@schuerig.de
 
Old 03-31-2010, 06:49 PM
Martin Steigerwald
 
Default Nepomuk: re-checking the strigi index constantly

Am Mittwoch 31 März 2010 schrieb Michael Schuerig:
> On Wednesday 31 March 2010, Carsten Pfeiffer wrote:
> > Am Dienstag, 30. März 2010 schrieb Michael Schuerig:
> > > Now, I'm wondering, is this something I ought to report as a bug
> > > against strigi or is the problem with Nepomuk for not logging
> > > abnormal termination of child processes? Or is it pdftotext for
> > > apparently producing invalid UTF-8 from a PDF (iconv doesn't
> > > complain about it, though)?
> >
> > All of the above ;-)
> >
> > I'd say that
> > - nepopmuk or strigi should notice that it crashed on a file and put
> > it into some blacklist until its mtime changes
> > - strigi should keep on indexing the other files instead of
> > restarting - pdftotext as the originator of the file ought to be
> > fixed
>
> Done.
>
> https://sourceforge.net/tracker/?func=detail&aid=2979889&group_id=17100
> 0&atid=856302 https://bugs.kde.org/show_bug.cgi?id=232814
>
> I'm not completely certain that pdftotext really does anything wrong.

See also these two of my Nepomuk related bug reports:

https://bugs.kde.org/show_bug.cgi?id=232395
https://bugs.kde.org/show_bug.cgi?id=232398

It seems you are also stumbling over an UTF-8 issue, but it seems a
different one than me. But I am not sure.

--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
 
Old 03-31-2010, 06:53 PM
Martin Steigerwald
 
Default Nepomuk: re-checking the strigi index constantly

Am Mittwoch 31 März 2010 schrieb Michael Schuerig:
> On Wednesday 31 March 2010, Martin Steigerwald wrote:
> > Am Dienstag 30 März 2010 schrieb Michael Schuerig:
> > > On Sunday 28 March 2010, Michael Schuerig wrote:
> > > > Apparently, a new
> > > > /usr/bin/nepomukservicestub nepomukstrigiservice
> > > > is started every 8 or 9 minutes. from ~/.xsession-errors I can't
> > > > see anything that indicates that these processes are crashing.
> > >
> > > Well, strace and gdb know better. Indexing is interrupted, when an
> > > assert statement fails and causes a SIGABRT
> > >
> > > # strigi-0.7.1/src/streamanalyzer/lineeventanalyzer.cpp:180
> > > void
> > > LineEventAnalyzer::handleUtf8Data(const char* data, uint32_t
> > > length) {
> > >
> > > assert(!(sawCarriageReturn && missingBytes > 0));
> >
> > That sounds simular to my bug report (see mail "nepomuk strigi
> > service crashed too often" in this list):
> >
> > https://bugs.kde.org/232395
>
> Similar, but probably not the exact same. In my case, there are no
> indications in ~/.xsession-errors that anything went wrong. I've seen
> messages of the "crashed to[o] often" kind for *other* problems, but
> not for this specific one.
>
> /usr/bin/nepomukservicestub and nepomukservices aren't very helpful as
> the process name, have you tried looking at the complete commandline?
> In top it's toggled by pressing 'c'.
>
> AFAIUI, Soprano is part of the storage backend used by Nepomuk, whereas
> Strigi belongs to the frontend retrieving the (meta)data. Your problem
> appears to occurs, when Nepomuk tries to store data in Soprano. Then
> the question is, where that data comes from and why it is malformed.

Yes, on the second look the issues sounded a bit different. Well, I put
some cross links in there, may the developers decide . I also added your
crash handling suggestions to my bug report

https://bugs.kde.org/show_bug.cgi?id=232398

as they are quite similar to what I suggested. I suggest you use your bug
report for informations on the crashes you encounter and let us share bug
#232398 for the crash handling stuff in Nepomuk.

--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
 

Thread Tools




All times are GMT. The time now is 01:46 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org