Making Infrastructure httpd logs public
On 18 April 2012 10:27, Ricky Zhou <firstname.lastname@example.org> wrote:
> On 2012-04-18 09:56:44 AM, Kevin Fenzi wrote:
>> has some ideas, but no great clear answer.
>> http://bug.st/mod_anonstats seems to use md5.
>> I'm assuming the consumer of these logs will process them after they
>> are hashed? In which case we do need to make sure the same ip hashes to
>> the same hash ? Or could we process them first, then hash the ip before
>> making the data public?
> I think something like an HMAC is the correct way to hide IPs.
> Unfortunately, there is still information other than IP address that can
> potentially leak some privacy information, such as:
> ¬** rare/unique user agent strings
> ¬** URLs that can be be linked to the person who's visiting them (a lot
> ¬* of mailman links contain emails, for example)
> ¬** potentially still-valid CSRF tokens
Plus with well known ips going to show up in any log.. the salting
mechanism is going to be not much use.
[If you know that your ip address was 126.96.36.199 and you went looking
for stuff at this time, you can figure out the salt by running the
hash as the unknown, your ip address as the known, and a script to
find the salt. Unless the salt is over 20 characters you will figure
it out within a month. Having changing salts doesn't work as well
because you will have to be able to track some things over time.]
Add in the fact that there are multiple other factors which
non-anonymize a person in a log file.. (or multiple log files) and I
don't see how it is reasonably possible to expect any strong
anonymization to occur [strong being defined that it would take more
than a month to determine who did what when.]
> I think a lot more thought and user notification should happen before we
> can consider making logs public. ¬*Alternatively, what do you think about
> a system where somebody who wanted to run statistics either gets access
> to the logs, or gives us a script that we'll verify and then run in a
> cronjob. ¬*I don't think we'll get enough requests to the point where
> doing things manually like this becomes a burden.
> Maybe we can also take a look at how organizations like wikipedia handle
> these sorts of things.
> infrastructure mailing list
Stephen J Smoogen.
"The core skill of innovators is error recovery, not failure avoidance."
Randy Nelson, President of Pixar University.
"Years ago my mother used to say to me,... Elwood, you must be oh
so smart or oh so pleasant. Well, for years I was smart. I
recommend pleasant. You may quote me." ¬*‚ÄĒJames Stewart as Elwood P. Dowd
infrastructure mailing list