FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo Development

 
 
LinkBack Thread Tools
 
Old 06-08-2011, 02:36 PM
Vikraman
 
Default Gentoo package statistics -- GSoC 2011

Hi everyone,

I'm working on the `Package statistics` project this year. Till now, I
have managed to write a client and server[0] to collect the following
information from hosts:

* Uname, portage profile, timestamp of portage tree
* ARCH, CHOST, CFLAGS, CXXFLAGS, FFLAGS, LDFLAGS, MAKEOPTS
* ACCEPT_KEYWORDS, FEATURES, USE, LANG, SYNC, GENTOO_MIRRORS
* Repository, Keyword, Useflags (plus,minus,unset), Counter, Size,
and Build time for each installed package

Is there a need to collect files installed by a package ? Doesn't PFL[1]
already provide that ?

Please provide some feedback on what other data should be collected, etc.

Also, I'm starting work on the webUI, and would like some
recommendations for stats pages, such as:

* Packages installed sorted by users
* Top arches, keywords, profiles
* Most enabled, disabled useflags per package/globally

[0]
http://git.overlays.gentoo.org/gitweb/?p=proj/gentoostats.git;a=commit;h=1b9697a090515d2a373e83b 1094d6e08ec405c02
[1] http://www.portagefilelist.de/index.php/Main_Page

--
Vikraman
 
Old 06-08-2011, 03:19 PM
"Paweł Hajdan, Jr."
 
Default Gentoo package statistics -- GSoC 2011

On 6/8/11 4:36 PM, Vikraman wrote:
> I'm working on the `Package statistics` project this year. Till now, I
> have managed to write a client and server[0] to collect the following
> information from hosts:

Excellent, good luck with the idea! I think that better information
about how Gentoo is actually used will greatly help improving it.

> Is there a need to collect files installed by a package ? Doesn't PFL[1]
> already provide that ?

Well, PFL is not an official Gentoo project. It might be useful, but I
wouldn't say it's a priority.

> Please provide some feedback on what other data should be collected, etc.

In my opinion it's *not* about collecting as much data as possible. I
think it's most important to get the core functionality working really
well, and convincing as large percentage of users as possible to enable
reporting the statistics (to make the results - hopefully - accurately
represent the user base). Please note that in some cases it may mean
collecting _less_ data, or thinking more about the privacy of the users.

For me, as a developer, even a list of packages sorted by popularity
(aka Debian/Ubuntu popcon) would be very useful.

Ah, and maybe files in /etc/portage: package.keywords and so on. It
could be useful to see what people are masking/unmasking, that may be an
indication of stale stabilizations or brokenness hitting the tree.
Anyway, I'd call it an enhancement.

> Also, I'm starting work on the webUI, and would like some
> recommendations for stats pages, such as:
>
> * Packages installed sorted by users

Cool!

> * Top arches, keywords, profiles

And percentage of ~arch vs arch users?

> * Most enabled, disabled useflags per package/globally

Also great, especially the per-package variant. It'd be also useful to
have per-profile data, to better tune the profile defaults.

> [0]
> http://git.overlays.gentoo.org/gitweb/?p=proj/gentoostats.git;a=commit;h=1b9697a090515d2a373e83b 1094d6e08ec405c02

I took a quick look at the code. Some random comments:

- it uses portage Python API a lot. But it's not stable, or at least not
guaranteed to be stable. Have you considered using helpers like portageq
(or eventually enhancing those helpers)?

- make the licensing super-clear (a LICENSE file, possibly some header
in every source file, and so on)

- how about submitting the data over HTTPS and not HTTP to better help
privacy?

- don't leave exception handling as a TODO; it should be a part of your
design, not an afterthought

- instead of or in addition to the setup.txt file, how about just
writing the real setup.py file for distutils?
 
Old 06-08-2011, 03:19 PM
Gilles Dartiguelongue
 
Default Gentoo package statistics -- GSoC 2011

Wasn't there a project like this a couple of years ago which tried to
use a cross-distro tool ?

--
Gilles Dartiguelongue <eva@gentoo.org>
Gentoo
 
Old 06-08-2011, 03:48 PM
Hans de Graaff
 
Default Gentoo package statistics -- GSoC 2011

On Wed, 2011-06-08 at 17:19 +0200, "Paweł Hajdan, Jr." wrote:

> In my opinion it's *not* about collecting as much data as possible. I
> think it's most important to get the core functionality working really
> well, and convincing as large percentage of users as possible to enable
> reporting the statistics (to make the results - hopefully - accurately
> represent the user base). Please note that in some cases it may mean
> collecting _less_ data, or thinking more about the privacy of the users.

+1 on this. Taking the extreme, I'd rather see a properly implemented
architecture that is installed on >50% of Gentoo system just reporting
on the arch, then something that collects a lot more data and is
installed on 50 machines. Once the framework is in place and there is
user uptake then it is easy to slowly extend the statistics collection
and gather more useful data.

> For me, as a developer, even a list of packages sorted by popularity
> (aka Debian/Ubuntu popcon) would be very useful.

That would be useful.

> Ah, and maybe files in /etc/portage: package.keywords and so on. It
> could be useful to see what people are masking/unmasking, that may be an
> indication of stale stabilizations or brokenness hitting the tree.
> Anyway, I'd call it an enhancement.

I'd rather not see this in the initial gsoc project if that means we'll
sacrifice a big rollout.

Kind regards,

Hans
 

Thread Tools




All times are GMT. The time now is 09:36 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org