FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Fedora Infrastructure

 
 
LinkBack Thread Tools
 
Old 04-18-2012, 02:39 PM
Emanuel Rietveld
 
Default Making Infrastructure httpd logs public

On Wed, Apr 18, 2012 at 2:15 AM, Ian Weller <ian@ianweller.org> wrote:
> As part of the statistics++ project [1] it is Infrastructure's plan to
> make data about visits to Fedora Project web servers public, in order to
> automate the information made available on the Statistics wiki page.
>
> The httpd logs currently contain personally-identifiable information:
> the IP address the request originated from and the user agent header.
>
> We think that at an absolute minimum we need to hash the IP address
> (with a seed, obviously) and leave the user agent header as is. But we
> wanted to make sure we got legal's opinion on this.
>
> [1]: h
>
> --
> Ian Weller <ian@ianweller.org>
>

(Moving thread to Infra list as my question is not a legal one)

What is the proposed hashing anonymizing scheme for the IP addresses?
How can you do this securely? Keep in mind that an attacker can
control some of the hashes in the public logs (by visiting the web
servers with various ip addresses).
_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 04-18-2012, 03:56 PM
Kevin Fenzi
 
Default Making Infrastructure httpd logs public

>
> (Moving thread to Infra list as my question is not a legal one)
>
> What is the proposed hashing anonymizing scheme for the IP addresses?
> How can you do this securely? Keep in mind that an attacker can
> control some of the hashes in the public logs (by visiting the web
> servers with various ip addresses).

http://stackoverflow.com/questions/4552566/logging-ip-address-for-uniqueness-without-storing-the-ip-address-itself-for-priv

has some ideas, but no great clear answer.

http://bug.st/mod_anonstats seems to use md5.

I'm assuming the consumer of these logs will process them after they
are hashed? In which case we do need to make sure the same ip hashes to
the same hash ? Or could we process them first, then hash the ip before
making the data public?

kevin
_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 04-18-2012, 04:27 PM
Ricky Zhou
 
Default Making Infrastructure httpd logs public

On 2012-04-18 09:56:44 AM, Kevin Fenzi wrote:
> http://stackoverflow.com/questions/4552566/logging-ip-address-for-uniqueness-without-storing-the-ip-address-itself-for-priv
>
> has some ideas, but no great clear answer.
>
> http://bug.st/mod_anonstats seems to use md5.
>
> I'm assuming the consumer of these logs will process them after they
> are hashed? In which case we do need to make sure the same ip hashes to
> the same hash ? Or could we process them first, then hash the ip before
> making the data public?
I think something like an HMAC is the correct way to hide IPs.
Unfortunately, there is still information other than IP address that can
potentially leak some privacy information, such as:
* rare/unique user agent strings
* URLs that can be be linked to the person who's visiting them (a lot
of mailman links contain emails, for example)
* potentially still-valid CSRF tokens

I think a lot more thought and user notification should happen before we
can consider making logs public. Alternatively, what do you think about
a system where somebody who wanted to run statistics either gets access
to the logs, or gives us a script that we'll verify and then run in a
cronjob. I don't think we'll get enough requests to the point where
doing things manually like this becomes a burden.

Maybe we can also take a look at how organizations like wikipedia handle
these sorts of things.

Thanks,
Ricky
_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 
Old 04-18-2012, 04:48 PM
Stephen John Smoogen
 
Default Making Infrastructure httpd logs public

On 18 April 2012 10:27, Ricky Zhou <ricky@fedoraproject.org> wrote:
> On 2012-04-18 09:56:44 AM, Kevin Fenzi wrote:
>> http://stackoverflow.com/questions/4552566/logging-ip-address-for-uniqueness-without-storing-the-ip-address-itself-for-priv
>>
>> has some ideas, but no great clear answer.
>>
>> http://bug.st/mod_anonstats seems to use md5.
>>
>> I'm assuming the consumer of these logs will process them after they
>> are hashed? In which case we do need to make sure the same ip hashes to
>> the same hash ? Or could we process them first, then hash the ip before
>> making the data public?
> I think something like an HMAC is the correct way to hide IPs.
> Unfortunately, there is still information other than IP address that can
> potentially leak some privacy information, such as:
> ¬** rare/unique user agent strings
> ¬** URLs that can be be linked to the person who's visiting them (a lot
> ¬* of mailman links contain emails, for example)
> ¬** potentially still-valid CSRF tokens

Plus with well known ips going to show up in any log.. the salting
mechanism is going to be not much use.

[If you know that your ip address was 72.124.10.4 and you went looking
for stuff at this time, you can figure out the salt by running the
hash as the unknown, your ip address as the known, and a script to
find the salt. Unless the salt is over 20 characters you will figure
it out within a month. Having changing salts doesn't work as well
because you will have to be able to track some things over time.]

Add in the fact that there are multiple other factors which
non-anonymize a person in a log file.. (or multiple log files) and I
don't see how it is reasonably possible to expect any strong
anonymization to occur [strong being defined that it would take more
than a month to determine who did what when.]

> I think a lot more thought and user notification should happen before we
> can consider making logs public. ¬*Alternatively, what do you think about
> a system where somebody who wanted to run statistics either gets access
> to the logs, or gives us a script that we'll verify and then run in a
> cronjob. ¬*I don't think we'll get enough requests to the point where
> doing things manually like this becomes a burden.
>
> Maybe we can also take a look at how organizations like wikipedia handle
> these sorts of things.
>
> Thanks,
> Ricky
>
> _______________________________________________
> infrastructure mailing list
> infrastructure@lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/infrastructure



--
Stephen J Smoogen.
"The core skill of innovators is error recovery, not failure avoidance."
Randy Nelson, President of Pixar University.
"Years ago my mother used to say to me,... Elwood, you must be oh
so smart or oh so pleasant. Well, for years I was smart. I
recommend pleasant. You may quote me." ¬*‚ÄĒJames Stewart as Elwood P. Dowd
_______________________________________________
infrastructure mailing list
infrastructure@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
 

Thread Tools




All times are GMT. The time now is 02:54 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org