Date: | September 30, 2009 / year-entry #311 |
Tags: | other |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20090930-01/?p=16543 |
Comments: | 7 |
Summary: | It's that time again: Sending some link love to my colleagues. Peter Torr explains why the anti-phishing filter operates on the original URL instead of a hash. Jamie Buckley from the MSN Search team explains why not every possible instant answer is offered. From our Microsoft Research Cambridge comes SenseCam, a wearable camera that takes... |
It's that time again: Sending some link love to my colleagues.
|
Comments (7)
Comments are closed. |
(Posting anonymously because the spam filter apparently hates me)
He lost me there. It seems to me that he’s blowing off the possibility that Microsoft employees with access to the original URLs will use or release that data even though it is against Microsoft policy.
http://en.wikipedia.org/wiki/AOL_search_data_scandal
I don’t find any of the "but we wanna use the original URL" arguments as compelling.
Off topic I know, but I met Betsy at Oz TechEd 2005, not that she’d remember me. Classy lady.
@Maurits, did you read the article? An employee acting against Microsoft policy is not "Microsoft itself", and he specifically mentions the possibility of malicious insiders.
Since you didn’t read the article, I don’t find your opinion on the arguments presented in the article compelling.
Maurits, if you had understood the article, you would know that hashing doesn’t protect privacy. Figuring out who you are based on what URLs you visit is not likely to be easier than figuring out what URLs you visit based on the hashes you send.
Furthermore, any phisher could just add a random component to the URL and be able to make unique hashes to evade detection. So not only does sending hashes not prevent abuse of the information, but it makes the whole system useless by being trivial to circumvent.
MS is making up excuses for retrieving the full url, this is not a unsolvable problem IF YOU WANT. At least the protocol and the domain/site/ipnumber part of the url could be hashed.
What’s the point of hashing?
So instead of reporting you went to http://www.google.com, it reports you went to 0x7e1b567a?
Since anti-phishing requires known sites, then hashing requires the bad URL. Well, it’s trivial when storing the hashes to store the original URL that goes with it (say, a more efficient lookup method is found). Oh wait, when we ;og the hash, we can translate it to the real URL too! It’s just another column in the database, so it comes "for free".
Also makes it hard to do similarity matches. If bankofamerica.example.com is bad, maybe bankofamerica.example.net is too?
Of course, the real problem with the argument is that the sort of attacker that most people are worried about aren’t necessarily going to be the kind that need to "confirm that you’ve been to a specific site" as he assumes.
Another possibility would be to have the client download a massive list of bad urls (or a section of it – maybe all the "g"’s) every time you visit a website, or as part of a periodic update.
Of course, another possibility is that we shouldn’t all be having this discussion on Raymond Chen’s blog when he didn’t write the article.