FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor


 
 
LinkBack Thread Tools
 
Old 12-04-2008, 10:52 AM
bardo
 
Default

On Thu, Dec 4, 2008 at 9:54 AM, Kristoffer Fossgård <kfs1@online.no> wrote:
> Your all missing my point. I never said counting packages by
> downloadrate is a perfect solution but that IT IS GOOD ENOUGH _and_
> BETTER THAN THE VOTE SYSTEM.

That's what I thought. Even monitoring a single download mirror could
be enough, if it's not an obscure and unpopular one. At least gathered
data would be statistically *relevant*, even though not accurate. We
can think of a single mirror as a good approximation of the whole
community, excluding i18n/l10n packages, which are highly dependendt
on the physical location of the mirror itself.

Corrado
 
Old 12-04-2008, 11:06 AM
"Ronald van Haren"
 
Default

On 12/4/08, bardo <ilbardo@gmail.com> wrote:
> On Thu, Dec 4, 2008 at 9:54 AM, Kristoffer Fossgård <kfs1@online.no> wrote:
>> Your all missing my point. I never said counting packages by
>> downloadrate is a perfect solution but that IT IS GOOD ENOUGH _and_
>> BETTER THAN THE VOTE SYSTEM.
>
> That's what I thought. Even monitoring a single download mirror could
> be enough, if it's not an obscure and unpopular one. At least gathered
> data would be statistically *relevant*, even though not accurate. We
> can think of a single mirror as a good approximation of the whole
> community, excluding i18n/l10n packages, which are highly dependendt
> on the physical location of the mirror itself.
>
> Corrado
>

I would think that the usage of packages depend on geographical
location in the same way as distribution usage depend on geographical
location.

ronald
 
Old 12-04-2008, 11:39 AM
Allan McRae
 
Default

Ronald van Haren wrote:

On 12/4/08, bardo <ilbardo@gmail.com> wrote:


On Thu, Dec 4, 2008 at 9:54 AM, Kristoffer Fossgård <kfs1@online.no> wrote:


Your all missing my point. I never said counting packages by
downloadrate is a perfect solution but that IT IS GOOD ENOUGH _and_
BETTER THAN THE VOTE SYSTEM.


That's what I thought. Even monitoring a single download mirror could
be enough, if it's not an obscure and unpopular one. At least gathered
data would be statistically *relevant*, even though not accurate. We
can think of a single mirror as a good approximation of the whole
community, excluding i18n/l10n packages, which are highly dependendt
on the physical location of the mirror itself.

Corrado




I would think that the usage of packages depend on geographical
location in the same way as distribution usage depend on geographical
location.

ronald



How would any one mirror be any less biased that people submitting data
via pkgstats? You are just sampling a different subset of people.


Allan
 
Old 12-04-2008, 03:10 PM
"Aaron Griffin"
 
Default

On Thu, Dec 4, 2008 at 5:52 AM, bardo <ilbardo@gmail.com> wrote:
> On Thu, Dec 4, 2008 at 9:54 AM, Kristoffer Fossgård <kfs1@online.no> wrote:
>> Your all missing my point. I never said counting packages by
>> downloadrate is a perfect solution but that IT IS GOOD ENOUGH _and_
>> BETTER THAN THE VOTE SYSTEM.
>
> That's what I thought. Even monitoring a single download mirror could
> be enough, if it's not an obscure and unpopular one. At least gathered
> data would be statistically *relevant*, even though not accurate. We
> can think of a single mirror as a good approximation of the whole
> community, excluding i18n/l10n packages, which are highly dependendt
> on the physical location of the mirror itself.

Guys. I have to point out a flaw in this reasoning. We are talking
about packages _entering_ community. Not remaining there. For packages
not in community, there is no download except from the AUR website. We
*could* in theory, track this, but there's 3 or 4 different ways one
can download things from the AUR

Again, just downloading a package does not mean I like it or use it.
As someone previously stated: if you tell me you've never installed a
packaged, tried it, and removed it because you didn't like it, you're
probably lying.
 
Old 12-04-2008, 05:30 PM
"Drew Frank"
 
Default

On Thu, Dec 4, 2008 at 10:27 AM, Drew Frank <ajfrank@ics.uci.edu> wrote:
> On Thu, Dec 4, 2008 at 10:23 AM, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
>> On Thu, Dec 4, 2008 at 12:20 PM, Drew Frank <goodgrue@archlinux.us> wrote:
>>> I do this all the time as well. One possible solution: if an "I use
>>> this!" message were sent to the stat-tracking server automatically
>>> from pacman upon installing a package, it would not be much of an
>>> extension to send a "Oops, not anymore" message when it is
>>> uninstalled. Thoughts?
>>
>> That's (a) a breach of privacy and (b) something that would never get
>> integrated into pacman.
>>
>> The pkgstats server doesn't keep track of user info. THe old archstats
>> did, but we have proven time and time again that no one ever uses
>> archstats.... we've had it for ages, have you ever used it?
>>
>
> Nope, I've never heard of archstats =p. Maybe it could be integrated
> into yaourt, if not pacman. I know aurvote already is, but perhaps it
> could be just made automated unless a user specifically opts-out.
> Also, maybe anonymous usage info could be allowed from those users who
> don't have AUR accounts. It could be treated differently (as less
> reliable) but would probably be better than nothing. Would that
> address the breach of privacy issue?
>

Also, if my ideas are so off-base as to just be a waste of time, feel
free to tell me so =)...I certainly don't want to make this process
any harder for you all than it already is.

Drew
 
Old 12-04-2008, 05:35 PM
"Aaron Griffin"
 
Default

On Thu, Dec 4, 2008 at 12:30 PM, Drew Frank <goodgrue@archlinux.us> wrote:
> On Thu, Dec 4, 2008 at 10:27 AM, Drew Frank <ajfrank@ics.uci.edu> wrote:
>> On Thu, Dec 4, 2008 at 10:23 AM, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
>>> On Thu, Dec 4, 2008 at 12:20 PM, Drew Frank <goodgrue@archlinux.us> wrote:
>>>> I do this all the time as well. One possible solution: if an "I use
>>>> this!" message were sent to the stat-tracking server automatically
>>>> from pacman upon installing a package, it would not be much of an
>>>> extension to send a "Oops, not anymore" message when it is
>>>> uninstalled. Thoughts?
>>>
>>> That's (a) a breach of privacy and (b) something that would never get
>>> integrated into pacman.
>>>
>>> The pkgstats server doesn't keep track of user info. THe old archstats
>>> did, but we have proven time and time again that no one ever uses
>>> archstats.... we've had it for ages, have you ever used it?
>>>
>>
>> Nope, I've never heard of archstats =p. Maybe it could be integrated
>> into yaourt, if not pacman. I know aurvote already is, but perhaps it
>> could be just made automated unless a user specifically opts-out.
>> Also, maybe anonymous usage info could be allowed from those users who
>> don't have AUR accounts. It could be treated differently (as less
>> reliable) but would probably be better than nothing. Would that
>> address the breach of privacy issue?

We tried... man we tried. Archstats has been around for some time and
we've tried to get people using it. It was a simple matter of
installing it and adding a rc.d DAEMON.

But the pkgstats thing has gotten more users in a matter of weeks than
archstats.

I know everyone wants super awesome metrics here, but it's just not
possible. Between privacy issues, complacency, and many other things,
they will never ever ever be accurate. At some point we have to say
"ok, good enough".
 
Old 12-04-2008, 05:37 PM
Allan McRae
 
Default

Kristoffer Fossgård wrote:

On Thu, Dec 4, 2008 at 5:52 AM, bardo <ilbardo@gmail.com> wrote:


On Thu, Dec 4, 2008 at 9:54 AM, Kristoffer Fossgård <kfs1@online.no>
wrote:


Your all missing my point. I never said counting packages by
downloadrate is a perfect solution but that IT IS GOOD ENOUGH _and_
BETTER THAN THE VOTE SYSTEM.


That's what I thought. Even monitoring a single download mirror could
be enough, if it's not an obscure and unpopular one. At least
gathered data would be statistically *relevant*, even though not
accurate. We can think of a single mirror as a good approximation of
the whole community, excluding i18n/l10n packages, which are highly
dependendt on the physical location of the mirror itself.


Guys. I have to point out a flaw in this reasoning. We are talking
about packages _entering_ community. Not remaining there. For packages
not in community, there is no download except from the AUR website. We
*could* in theory, track this, but there's 3 or 4 different ways one
can download things from the AUR



There's one way technically. You download the tarball. Where are all the
other ways? Even if there are why is this even relevant? It's not like
a reasonably good-enough download counter is hard technically to
accomplish(feel free to scold me if you think it is).


Again, just downloading a package does not mean I like it or use it.
As someone previously stated: if you tell me you've never installed a
packaged, tried it, and removed it because you didn't like it, you're
probably lying.



Your still not getting it. The system doesn't have to be 100% perfect,
it only has to offer a representation of which packages
are "popular". that's it. we don't need to know how many "downloads"
are really "conscientious" because the large majority of them will be.



The two systems we already have "offer a representation of which
packages are popular" but there is much debate about how good that
representation is. A third is really not going to help....


Allan
 
Old 12-04-2008, 05:52 PM
"Drew Frank"
 
Default

There as been a lot of good discussion, and it appears there are more
or less two "sides" here. Perhaps it would be a good idea for people
to try to summarize the argument of the "opposing side", to see if the
two groups really understand each other's positions. I've seen a
bunch of good points made by proponents of either side, but there's a
danger that they're being lost in the mailing list deluge. A concise
list of of the pros and cons of the various courses of action might be
a helpful tool, too -- editable by all on a wiki page, perhaps.

Just an idea .

Drew

On Thu, Dec 4, 2008 at 10:37 AM, Allan McRae <allan@archlinux.org> wrote:
> Kristoffer Fossgård wrote:
>>>
>>> On Thu, Dec 4, 2008 at 5:52 AM, bardo <ilbardo@gmail.com> wrote:
>>>
>>>>
>>>> On Thu, Dec 4, 2008 at 9:54 AM, Kristoffer Fossgård <kfs1@online.no>
>>>> wrote:
>>>>
>>>>>
>>>>> Your all missing my point. I never said counting packages by
>>>>> downloadrate is a perfect solution but that IT IS GOOD ENOUGH _and_
>>>>> BETTER THAN THE VOTE SYSTEM.
>>>>>
>>>>
>>>> That's what I thought. Even monitoring a single download mirror could
>>>> be enough, if it's not an obscure and unpopular one. At least
>>>> gathered data would be statistically *relevant*, even though not
>>>> accurate. We can think of a single mirror as a good approximation of
>>>> the whole community, excluding i18n/l10n packages, which are highly
>>>> dependendt on the physical location of the mirror itself.
>>>>
>>>
>>> Guys. I have to point out a flaw in this reasoning. We are talking
>>> about packages _entering_ community. Not remaining there. For packages
>>> not in community, there is no download except from the AUR website. We
>>> *could* in theory, track this, but there's 3 or 4 different ways one
>>> can download things from the AUR
>>>
>>
>> There's one way technically. You download the tarball. Where are all the
>> other ways? Even if there are why is this even relevant? It's not like
>> a reasonably good-enough download counter is hard technically to
>> accomplish(feel free to scold me if you think it is).
>>
>>>
>>> Again, just downloading a package does not mean I like it or use it.
>>> As someone previously stated: if you tell me you've never installed a
>>> packaged, tried it, and removed it because you didn't like it, you're
>>> probably lying.
>>>
>>
>> Your still not getting it. The system doesn't have to be 100% perfect,
>> it only has to offer a representation of which packages
>> are "popular". that's it. we don't need to know how many "downloads"
>> are really "conscientious" because the large majority of them will be.
>>
>
> The two systems we already have "offer a representation of which packages
> are popular" but there is much debate about how good that representation is.
> A third is really not going to help....
>
> Allan
>
>
>
 
Old 12-04-2008, 06:03 PM
w9ya
 
Default

I would rather do this here on the mailing list. I do not have a wiki account and have never entered info into one before.

FURTHER, while a good idea to summarize, there are new positions and so forth every day. And more people are speaking out in opposition to this proposal every day. And more people are asking for details on the metrics. And more people are asking for creating better metrics before considering this proposal.


So it may be premature to offer up a summary today. It is my understanding that we have through next Sunday for a discussion period. Is that correct ?

Bob F.

On Thu, Dec 4, 2008 at 11:52 AM, Drew Frank <goodgrue@archlinux.us> wrote:

There as been a lot of good discussion, and it appears there are more

or less two "sides" here. *Perhaps it would be a good idea for people

to try to summarize the argument of the "opposing side", to see if the

two groups really understand each other's positions. *I've seen a

bunch of good points made by proponents of either side, but there's a

danger that they're being lost in the mailing list deluge. *A concise

list of of the pros and cons of the various courses of action might be

a helpful tool, too -- editable by all on a wiki page, perhaps.



Just an idea .



Drew



On Thu, Dec 4, 2008 at 10:37 AM, Allan McRae <allan@archlinux.org> wrote:

> Kristoffer Fossgård wrote:

>>>

>>> On Thu, Dec 4, 2008 at 5:52 AM, bardo <ilbardo@gmail.com> wrote:

>>>

>>>>

>>>> On Thu, Dec 4, 2008 at 9:54 AM, Kristoffer Fossgård <kfs1@online.no>

>>>> wrote:

>>>>

>>>>>

>>>>> Your all missing my point. I never said counting packages by

>>>>> downloadrate is a perfect solution but that IT IS GOOD ENOUGH _and_

>>>>> BETTER THAN THE VOTE SYSTEM.

>>>>>

>>>>

>>>> That's what I thought. Even monitoring a single download mirror could

>>>> be enough, if it's not an obscure and unpopular one. At least

>>>> gathered data would be statistically *relevant*, even though not

>>>> accurate. We can think of a single mirror as a good approximation of

>>>> the whole community, excluding i18n/l10n packages, which are highly

>>>> dependendt on the physical location of the mirror itself.

>>>>

>>>

>>> Guys. I have to point out a flaw in this reasoning. We are talking

>>> about packages _entering_ community. Not remaining there. For packages

>>> not in community, there is no download except from the AUR website. We

>>> *could* in theory, track this, but there's 3 or 4 different ways one

>>> can download things from the AUR

>>>

>>

>> There's one way technically. You download the tarball. Where are all the

>> other ways? Even if there are why is this even relevant? It's not like

>> a reasonably good-enough download counter is hard technically to

>> accomplish(feel free to scold me if you think it is).

>>

>>>

>>> Again, just downloading a package does not mean I like it or use it.

>>> As someone previously stated: if you tell me you've never installed a

>>> packaged, tried it, and removed it because you didn't like it, you're

>>> probably lying.

>>>

>>

>> Your still not getting it. The system doesn't have to be 100% perfect,

>> it only has to offer a representation of which packages

>> are "popular". that's it. we don't need to know how many "downloads"

>> are really "conscientious" because the large majority of them will be.

>>

>

> The two systems we already have "offer a representation of which packages

> are popular" but there is much debate about how good that representation is.

> *A third is really not going to help....

>

> Allan

>

>

>
 
Old 12-04-2008, 11:37 PM
Ondřej Kučera
 
Default

Hello,


We have mirrors. Almost 100 of them. Feel free to contact them all,
have them write code to count downloads which then sends the stats to
us, and then we can implement this.

What you suggest is absolutely not feasible at all.


That's too bad, I wanted to suggest counting of downloads too (because I
believe that the number downloads of particular version of a package
would after a while correlate quite well with the number of users that
actually use, i. e. upgrade this package - it should more or less solve
the problem of people trying the package and removing it quickly after
that that was mentioned).


Anyway I've been meaning to contribute with some ideas for the topic for
at least four days (since I read the first IRC log on Sunday),
unfortunately my job hasn't allowed it this week. I just wanted to do
some thinking out loud about both methods (voting/pkgstats) for both
packages already in community and those that might get there in the
future from a regular user's point of view (also with regards to
privacy/paranoia matters).


(1) pkgstats
The obvious problem with accuracy is that not everybody will use it (or
use it even from time to time to update their "contribution" to the
statistics). Some people don't know about it, some people won't be
bothered, some might be concerned about privacy. Even though IP address
is not necessarily an identifier of a person, it still a "good enough
information". I actually more or less trust Arch devs that really only a
hash of the IP is stored together with the package list but I hardly can
be sure and there are much more paranoid users out there than myself.
(Their problem doesn't have to be only with privacy itself - when
someone knows the packages you use and even the exact versions, it makes
it so much easier to target some kind of attack on the system.)


On the other hand it can be nicely used to promote a package that is in
unsupported. "Do you use this package? Do you want to see it in
community? Have you run pkgstats on you system then?" It would be nice
to see the statistics in AUR frontend, one could see how far the package
is from the magic number that makes the package a good candidate for
community (whatever the number will be).


As for pruning of community as it is now (if it still is an issue, I'm
not quite sure anymore). How about this. Pick a reasonable percentage
(it doesn't have to be the same number as the one for new packages
entering community, it can be lower) by whatever criteria (number of
packages to prune, number of MB to save, ...), create a list of all the
packages with usage below this number and create lists of these packages
grouped by their maintainers. Then send the individual maintainer-lists
to the maintainers with a note that they should consider whether or not
these particular packages are really a good material for community. At
the same time put the list of all those packages on the web, announce
its existence in the latest news and tell people that if they see a
package/packages they use and haven't yet run pkgstats, they should
probably do it now, otherwise the package might be removed from
community. Then wait for some time and look at the change in statistics
(maybe there will be some, maybe there won't).


(2) votes
Again, not everybody uses it. Especially since voting means that you
have to have an AUR account. Today everybody has tons of accounts at
different internet services, ideally one should have as many passwords
as possible, and people don't like to create yet another account (I know
I don't). Frankly, if I hadn't needed those about 15 packages I now
maintain in unsupported (because I hadn't found them there), I wouldn't
have created an AUR account either.


There's another problem with accuracy. Even users who have an account
and vote don't vote for every single package they use. Especially many
people (myself included) probably never voted for packages already in
community. This makes the system usable for dealing with the transition
unsupported -> community but not for the other way round. That, too,
could be helped by similar approach as above - count packages with the
least votes, create their list (lists) and urge people to vote for
packages on this list if they use them a want to see them still in
community in the future.


The problem is that this way the privacy concerns will be even bigger.
Right now if someone looked up which packages I voted for, it wouldn't
give them much of an idea which packages I actually use (because I only
voted for packages in unsupported and only for those that I had a reason
to believe that my vote might help push them to community). After
applying the above suggestion, anyone who gained access to AUR data
knows more or less about all community packages that a certain nickname
uses (which is much worse that knowing that this list of packages is
used by someone with this hash of IP address - which is the information
pkgstats provides). Moreover, each nickname is associated with an e-mail
which is then more or less associated with a particular person. Of
course, the e-mail can be fake (or completely or almost unused), on the
other hand if you also want to maintain some packages in unsupported,
you want to have a valid e-mail, so, if you're paranoid, you'd probably
have to have two AUR accounts - one connected to you for maintaining
packages and the other one as "anonymous" as possible just for voting.


Conclusion
Unfortunately, I don't have a solution. Both systems can be made more
accurate (and useful for pointing fingers at packages that really aren't
all that much used) but at the price of some amount of privacy or even
security. I still think that the best solution would be counting
downloads, because it would be quite accurate and also quite anonymous
(definitely more than pkgstats or voting) but sadly it's not an option.


I hope I haven't wasted too much time of those who have read it all. If
so, then I apologize :-), but I felt that when I spent some the time
thinking about these matters on my way to work and back this week, I
should share the thoughts.


Ondřej


--
Cheers,
Ondřej Kučera
 

Thread Tools




All times are GMT. The time now is 05:32 AM.

VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org