FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo Portage Developer

 
 
LinkBack Thread Tools
 
Old 11-24-2008, 02:18 PM
tvali
 
Default search functionality in emerge

2008/11/24 René 'Necoro' Neumann <lists@necoro.eu>

What you mentioned for the filesystem might be a nice thing (actually I

started something like this some time ago [1] , though it is now dead

), but it does not help in the index/determine changes thing. It is

just another API .


My thoughline is that when this FS is mounted, it's only portage dir - so having this FS mounted, changes are all noticed, because you do all changes in that FS. Anyway, when you unmount it and remount, some things might go wrong and that's what I'm thinking about ...but that's not a big problem.



Perhaps the "index after sync" is sufficient for most parts of the

userbase - but esp. those who often deal with their own local overlays

(like me) do not want to have to re-index manually - esp. if re-indexing

takes a long time. The best solution would be to have portage find a)

THAT something has been changed and b) WHAT has been changed. So that it

only has to update these parts of the index, and thus do not be sth

enerving for the users (remind the "Generate Metadata" stuff (or

whatever it was called) in older portage versions, which alone seemed to

take longer than the rest of the sync progress)



Regards,

René



[1] https://launchpad.net/catapultfs

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v2.0.9 (GNU/Linux)

Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org



iEYEARECAAYFAkkqxSsACgkQ4UOg/zhYFuBPSACdH9H6VChrhlcovucgVAcCsp/B

j+AAmgPXPmuBs5GWnNAfs5nss4HlBEMT

=WG8B

-----END PGP SIGNATURE-----





--
tvali

Kuskilt foorumist: http://www.cooltests.com - kui inglise keelt oskad. Muide, üle 120 oled väga tark, üle 140 oled geenius, mingi 170 oled ju mingi täica pea nagu prügikast...
 
Old 11-24-2008, 04:15 PM
tvali
 
Default search functionality in emerge

I take it shortly together as Rene didn't catch all and so I was fuzzy:

Portage tree has automatically updateable parts, which should not changed by user, and overlay, which will be. Thus, index of this automatic part should be updated only after "emerge --sync".


Speedup should contain custom filesystem, which would be called PortageFS, for example. In initial version, PortageFS uses current portage tree and generates additional indexes.

So, when you bootup, you have portage tree in /usr/portage. At some point, PortageFS is mounted into the same directory, /usr/portage. It will map real /usr/portage directory into /usr/portage mount point and create some additional folders like /usr/portage/search, which maps files to do real searches. /usr/portage/handler would be a file, where you can write query and read result. It also contains virtual files to check dependancies and such stuff - many things you could use with your scripts.


When it's mounted, every change is noticed and indexes will be automagically updated (and sometimes after communication with portage - for example, updates when doing "emerge --sync" should not happen automagically maybe, as it makes things slower. When it's not mounted, you can change user files, but must run some notification script afterwards maybe to rebuild indexes.


Indexes are built-in into FS.

If PortageFS is not mounted, for example because of some emergency reboot, portage can work without indexes, using real directory instead of this mount point.
 
Old 11-30-2008, 10:42 PM
"Emma Strubell"
 
Default search functionality in emerge

You guys all have some great ideas, but I don't think I'd have enough time to be able to implement them before my project is due... especially because they appear to be a bit beyond my current programming skills. I would love to devote a lot more time to this project, but I just can't right now because I already have a lot of other things on my plate. i am really interested in contributing to Gentoo and portage in the future, though. I'm thinking this summer I'll have a chance... Anyway, I'm going to try to keep it simple and just implement a suffix trie, and hope that that provides some measurable speed improvement :] Thanks again for everyone's help, though, and I'll definitely share the (amature and minimal, sorry!) results of my project if you're interested.


Emma

On Mon, Nov 24, 2008 at 12:15 PM, tvali <qtvali@gmail.com> wrote:

I take it shortly together as Rene didn't catch all and so I was fuzzy:

Portage tree has automatically updateable parts, which should not changed by user, and overlay, which will be. Thus, index of this automatic part should be updated only after "emerge --sync".



Speedup should contain custom filesystem, which would be called PortageFS, for example. In initial version, PortageFS uses current portage tree and generates additional indexes.

So, when you bootup, you have portage tree in /usr/portage. At some point, PortageFS is mounted into the same directory, /usr/portage. It will map real /usr/portage directory into /usr/portage mount point and create some additional folders like /usr/portage/search, which maps files to do real searches. /usr/portage/handler would be a file, where you can write query and read result. It also contains virtual files to check dependancies and such stuff - many things you could use with your scripts.



When it's mounted, every change is noticed and indexes will be automagically updated (and sometimes after communication with portage - for example, updates when doing "emerge --sync" should not happen automagically maybe, as it makes things slower. When it's not mounted, you can change user files, but must run some notification script afterwards maybe to rebuild indexes.



Indexes are built-in into FS.

If PortageFS is not mounted, for example because of some emergency reboot, portage can work without indexes, using real directory instead of this mount point.
 
Old 12-01-2008, 06:34 AM
Duncan
 
Default search functionality in emerge

"Emma Strubell" <emma.strubell@gmail.com> posted
5a8c638a0811301542s4aca92c3ie68ef427913c0523@mail. gmail.com, excerpted
below, on Sun, 30 Nov 2008 18:42:11 -0500:

> i am really
> interested in contributing to Gentoo and portage in the future, though.
> I'm thinking this summer I'll have a chance...

FWIW, Gentoo usually participates in the Google Summer of Code. Assuming
they have it again next year, if you're already considering spending some
time on Gentoo code this summer, might as well try to get paid a little
something for it. It could/should be a nice resume booster, too. =:^)

--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
 
Old 12-01-2008, 09:40 AM
"Emma Strubell"
 
Default search functionality in emerge

I completely forgot about Google's Summer of Code! Thanks for reminding me. Hopefully I won't forget again by the time summer rolls around, obviously I wouldn't mind getting a little extra money for doing something I'd do for free anyway.


On a more related note: What, exactly, does porttree.py do? And am I correct in thinking that my suffix tree(s) should somewhat replace porttree.py? Or, should I be using porttree.py in order to populate my tree? I think I have the suffix tree sufficiently figured out, I'm just trying to determine where, exactly, the tree will fit in to the portage code, and what the best way to populate it (with package names and some corresponding metadata) would be.


On Mon, Dec 1, 2008 at 2:34 AM, Duncan <1i5t5.duncan@cox.net> wrote:

"Emma Strubell" <emma.strubell@gmail.com> posted

5a8c638a0811301542s4aca92c3ie68ef427913c0523@mail. gmail.com, excerpted

below, on *Sun, 30 Nov 2008 18:42:11 -0500:



> i am really

> interested in contributing to Gentoo and portage in the future, though.

> I'm thinking this summer I'll have a chance...



FWIW, Gentoo usually participates in the Google Summer of Code. *Assuming

they have it again next year, if you're already considering spending some

time on Gentoo code this summer, might as well try to get paid a little

something for it. *It could/should be a nice resume booster, too. =:^)



--

Duncan - List replies preferred. * No HTML msgs.

"Every nonfree program has a lord, a master --

and if you use the program, he is your master." *Richard Stallman
 
Old 12-01-2008, 04:52 PM
Zac Medico
 
Default search functionality in emerge

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Emma Strubell wrote:
> I completely forgot about Google's Summer of Code! Thanks for reminding me.
> Hopefully I won't forget again by the time summer rolls around, obviously I
> wouldn't mind getting a little extra money for doing something I'd do for
> free anyway.
>
> On a more related note: What, exactly, does porttree.py do? And am I correct
> in thinking that my suffix tree(s) should somewhat replace porttree.py? Or,
> should I be using porttree.py in order to populate my tree?

You should use portree.py to populate it. Specifically, you should
use portdbapi.aux_get() calls to access the package metadata that
you'll need, similar to how the code in the existing search class
accesses it.

> I think I have
> the suffix tree sufficiently figured out, I'm just trying to determine
> where, exactly, the tree will fit in to the portage code, and what the best
> way to populate it (with package names and some corresponding metadata)
> would be.

There are there possible times that I imagine a person might want to
populate it:

1) Automatically after emerge --sync. This should not be mandatory
since it will be somewhat time consuming and some users are very
sensitive about --sync time. Note that FEATURES=metadate-transfer is
disabled by default in the latest versions of portage, specifically
to reduce --sync time.

2) On demand, when emerge --search is invoked. The calling user will
need appropriate file system permissions in order to update the
search index.

3) On request, by calling a command that is specifically designed to
generate the search index. This could be a subcommand of emaint.

For the index file format, it would be simplest to use a python
pickle file, but you might choose another format if you'd like the
index to be accessible without python and the portage API (probably
not necessary).
- --
Thanks,
Zac
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)

iEYEARECAAYFAkk0JFAACgkQ/ejvha5XGaONDACgixnmCh9Ei6MyUGIZXpiFt7F2
gqMAoOhf5H2uZHB7xhjecOcL0G3w/cqR
=hFNz
-----END PGP SIGNATURE-----
 
Old 12-01-2008, 08:25 PM
"Emma Strubell"
 
Default search functionality in emerge

Thanks for the clarification. I was planning on forcing an update of the index as a part of emerge --sync, and implementing a command that would update the search index (leaving it up to the user to update after making any manual changes to the portage tree). That way the search index should always be up-to-date when emerge -s is called. It does make sense for the update upon --sync to be optional, but I guess I don't see why the update should always be SO slow. Of course the first population of the tree will take quite a while, but assuming regular (daily?) --syncs (and therefore updates to the index), subsequent updates shouldn't take very long, since there will only be a few (hundred?) changes to be made to the tree.


And I do plan on using a pickling the search tree :]

Emma

On Mon, Dec 1, 2008 at 12:52 PM, Zac Medico <zmedico@gentoo.org> wrote:

-----BEGIN PGP SIGNED MESSAGE-----

Hash: SHA1



Emma Strubell wrote:

> I completely forgot about Google's Summer of Code! Thanks for reminding me.

> Hopefully I won't forget again by the time summer rolls around, obviously I

> wouldn't mind getting a little extra money for doing something I'd do for

> free anyway.

>

> On a more related note: What, exactly, does porttree.py do? And am I correct

> in thinking that my suffix tree(s) should somewhat replace porttree.py? Or,

> should I be using porttree.py in order to populate my tree?



You should use portree.py to populate it. Specifically, you should

use portdbapi.aux_get() calls to access the package metadata that

you'll need, similar to how the code in the existing search class

accesses it.



> I think I have

> the suffix tree sufficiently figured out, I'm just trying to determine

> where, exactly, the tree will fit in to the portage code, and what the best

> way to populate it (with package names and some corresponding metadata)

> would be.



There are there possible times that I imagine a person might want to

populate it:



1) Automatically after emerge --sync. This should not be mandatory

since it will be somewhat time consuming and some users are very

sensitive about --sync time. Note that FEATURES=metadate-transfer is

disabled by default in the latest versions of portage, specifically

to reduce --sync time.



2) On demand, when emerge --search is invoked. The calling user will

need appropriate file system permissions in order to update the

search index.



3) On request, by calling a command that is specifically designed to

generate the search index. This could be a subcommand of emaint.



For the index file format, it would be simplest to use a python

pickle file, but you might choose another format if you'd like the

index to be accessible without python and the portage API (probably

not necessary).

- --

Thanks,

Zac

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v2.0.9 (GNU/Linux)



iEYEARECAAYFAkk0JFAACgkQ/ejvha5XGaONDACgixnmCh9Ei6MyUGIZXpiFt7F2

gqMAoOhf5H2uZHB7xhjecOcL0G3w/cqR

=hFNz

-----END PGP SIGNATURE-----
 
Old 12-01-2008, 08:52 PM
Tambet
 
Default search functionality in emerge

I would suggest a different way of updates. When you manually change
portage tree, you have to make an overlay. Overlay, as it's updated and
managed by human being, will be always small (unless someone makes a
script, which creates million overlay updates, but I dont think it
would be efficient way to do anything). So, when you search, you can
search Portage tree with index, which is updated with --sync and then
search overlay, which is small and fast to search anyway. Overlay
should not have index in such case. If anyone is going to change
portage tree by hand, those changes will be lost with next --sync and
thus noone should do it anyway - this case should not be considered at
all.
Tambet - technique evolves to art, art evolves to magic, magic evolves to just doing.



2008/12/1 Emma Strubell <emma.strubell@gmail.com>

Thanks for the clarification. I was planning on forcing an update of the index as a part of emerge --sync, and implementing a command that would update the search index (leaving it up to the user to update after making any manual changes to the portage tree). That way the search index should always be up-to-date when emerge -s is called. It does make sense for the update upon --sync to be optional, but I guess I don't see why the update should always be SO slow. Of course the first population of the tree will take quite a while, but assuming regular (daily?) --syncs (and therefore updates to the index), subsequent updates shouldn't take very long, since there will only be a few (hundred?) changes to be made to the tree.



And I do plan on using a pickling the search tree :]

Emma

On Mon, Dec 1, 2008 at 12:52 PM, Zac Medico <zmedico@gentoo.org> wrote:


-----BEGIN PGP SIGNED MESSAGE-----

Hash: SHA1



Emma Strubell wrote:

> I completely forgot about Google's Summer of Code! Thanks for reminding me.

> Hopefully I won't forget again by the time summer rolls around, obviously I

> wouldn't mind getting a little extra money for doing something I'd do for

> free anyway.

>

> On a more related note: What, exactly, does porttree.py do? And am I correct

> in thinking that my suffix tree(s) should somewhat replace porttree.py? Or,

> should I be using porttree.py in order to populate my tree?



You should use portree.py to populate it. Specifically, you should

use portdbapi.aux_get() calls to access the package metadata that

you'll need, similar to how the code in the existing search class

accesses it.



> I think I have

> the suffix tree sufficiently figured out, I'm just trying to determine

> where, exactly, the tree will fit in to the portage code, and what the best

> way to populate it (with package names and some corresponding metadata)

> would be.



There are there possible times that I imagine a person might want to

populate it:



1) Automatically after emerge --sync. This should not be mandatory

since it will be somewhat time consuming and some users are very

sensitive about --sync time. Note that FEATURES=metadate-transfer is

disabled by default in the latest versions of portage, specifically

to reduce --sync time.



2) On demand, when emerge --search is invoked. The calling user will

need appropriate file system permissions in order to update the

search index.



3) On request, by calling a command that is specifically designed to

generate the search index. This could be a subcommand of emaint.



For the index file format, it would be simplest to use a python

pickle file, but you might choose another format if you'd like the

index to be accessible without python and the portage API (probably

not necessary).

- --

Thanks,

Zac

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v2.0.9 (GNU/Linux)



iEYEARECAAYFAkk0JFAACgkQ/ejvha5XGaONDACgixnmCh9Ei6MyUGIZXpiFt7F2

gqMAoOhf5H2uZHB7xhjecOcL0G3w/cqR

=hFNz

-----END PGP SIGNATURE-----
 
Old 12-01-2008, 09:08 PM
"Emma Strubell"
 
Default search functionality in emerge

Good point. I may just ignore overlays completely because 1) I don't use them and 2) does anyone really need to search an overlay anyway? aren't any packages added via an overlay added deliberately?


On Mon, Dec 1, 2008 at 4:52 PM, Tambet <qtvali@gmail.com> wrote:

I would suggest a different way of updates. When you manually change
portage tree, you have to make an overlay. Overlay, as it's updated and
managed by human being, will be always small (unless someone makes a
script, which creates million overlay updates, but I dont think it
would be efficient way to do anything). So, when you search, you can
search Portage tree with index, which is updated with --sync and then
search overlay, which is small and fast to search anyway. Overlay
should not have index in such case. If anyone is going to change
portage tree by hand, those changes will be lost with next --sync and
thus noone should do it anyway - this case should not be considered at
all.
Tambet - technique evolves to art, art evolves to magic, magic evolves to just doing.



2008/12/1 Emma Strubell <emma.strubell@gmail.com>


Thanks for the clarification. I was planning on forcing an update of the index as a part of emerge --sync, and implementing a command that would update the search index (leaving it up to the user to update after making any manual changes to the portage tree). That way the search index should always be up-to-date when emerge -s is called. It does make sense for the update upon --sync to be optional, but I guess I don't see why the update should always be SO slow. Of course the first population of the tree will take quite a while, but assuming regular (daily?) --syncs (and therefore updates to the index), subsequent updates shouldn't take very long, since there will only be a few (hundred?) changes to be made to the tree.




And I do plan on using a pickling the search tree :]

Emma

On Mon, Dec 1, 2008 at 12:52 PM, Zac Medico <zmedico@gentoo.org> wrote:



-----BEGIN PGP SIGNED MESSAGE-----

Hash: SHA1



Emma Strubell wrote:

> I completely forgot about Google's Summer of Code! Thanks for reminding me.

> Hopefully I won't forget again by the time summer rolls around, obviously I

> wouldn't mind getting a little extra money for doing something I'd do for

> free anyway.

>

> On a more related note: What, exactly, does porttree.py do? And am I correct

> in thinking that my suffix tree(s) should somewhat replace porttree.py? Or,

> should I be using porttree.py in order to populate my tree?



You should use portree.py to populate it. Specifically, you should

use portdbapi.aux_get() calls to access the package metadata that

you'll need, similar to how the code in the existing search class

accesses it.



> I think I have

> the suffix tree sufficiently figured out, I'm just trying to determine

> where, exactly, the tree will fit in to the portage code, and what the best

> way to populate it (with package names and some corresponding metadata)

> would be.



There are there possible times that I imagine a person might want to

populate it:



1) Automatically after emerge --sync. This should not be mandatory

since it will be somewhat time consuming and some users are very

sensitive about --sync time. Note that FEATURES=metadate-transfer is

disabled by default in the latest versions of portage, specifically

to reduce --sync time.



2) On demand, when emerge --search is invoked. The calling user will

need appropriate file system permissions in order to update the

search index.



3) On request, by calling a command that is specifically designed to

generate the search index. This could be a subcommand of emaint.



For the index file format, it would be simplest to use a python

pickle file, but you might choose another format if you'd like the

index to be accessible without python and the portage API (probably

not necessary).

- --

Thanks,

Zac

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v2.0.9 (GNU/Linux)



iEYEARECAAYFAkk0JFAACgkQ/ejvha5XGaONDACgixnmCh9Ei6MyUGIZXpiFt7F2

gqMAoOhf5H2uZHB7xhjecOcL0G3w/cqR

=hFNz

-----END PGP SIGNATURE-----
 
Old 12-01-2008, 09:17 PM
René 'Necoro' Neumann
 
Default search functionality in emerge

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Emma Strubell schrieb:
> 2) does anyone really need to search an overlay anyway?

Of course. Take large (semi-)official overlays like sunrise. They can
easily be seen as a second portage tree.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkk0YpEACgkQ4UOg/zhYFuD3jQCdG/ChDmyOncpgUKeMuqDxD1Tt
0mwAn2FXskdEAyFlmE8shUJy7WlhHr4S
=+lCO
-----END PGP SIGNATURE-----
 

Thread Tools




All times are GMT. The time now is 02:49 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org