Device Fingerprinting - The Good, The Bad, and The Ugly

Unread post by **RealityRipple** » 2020-08-08, 17:09

Everyone hates fingerprinting. It's creepy as fuck, it's dangerous, and it's really difficult to negate.

However, everyone also hates when you check a box that says "Remember me on this device" when they mean "Remember me on this browser on this device on this network for like 15 days before your IP changes and we don't recognize you anymore". Even worse, older computer users that don't quite understand how strict discrete computers are wonder why logging in on one computer in their home successfully doesn't allow them to log in with all their other home computers without repeating some authenticator, email, or phone verification method.

Working on my own system has made me realize how important to security fingerprinting actually is, and that it's basically castrated by privacy concerns, and rightly so. In the end, the best a web developer can do is save a random id variable as a cookie or local storage value to keep track of a single browser on a single device if they want to let a user skip authentication on subsequent logins.

I have an idea, but I have no idea if it would be accepted by browser developers or users, so I'm asking here - not as a direct feature request to Pale Moon, but as a kind of informal query as to where the developers and users on this site stand. So, here's my proposal:

Three new identification values: "browserID", "deviceID", and "lanID", which JavaScript can access only after a browser confirmation prompt, like Geolocation data or Notifications require to succeed. browserID could essentially be random, but should be saved as a constant, somewhere outside the browser's profile data, so that it's not copied when a profile is copied, and not lost when a different profile is used. deviceID should be deterministically generated exactly the same way for any browser, using things like the CPU speed, OS install date, hard drive size, MAC address etc..., to create a hash or other value that can't leak the specs of the device but can still be used to identify a device regardless of which browser is being used. And finally, lanID should be similar to deviceID but for the local network, using info about the default gateway, DHCP server, possibly even a router's MAC address, or whatever else is network-specific but not device-specific to verify that any two devices are connected to the same network. When prompted, the level of detail being requested should be spelled out very obviously: such as "This website wishes to keep track of your local network." or "This website wishes to recognize your computer." or the like, possibly with a selection of prompt details the website can request the browser display to the user, such as "This website wishes to recognize your computer for future logins." or "This website wishes to track your browser for advertising purposes.", which, while not necessarily verifiable, at least would provide users with some more verbose details about the nature of the request. And so long as the prompt details are just numbers on the web development side of things, the user shouldn't ever get weird prompts like "This website wishes to recognize your computer otherwise terrible things will happen to you so press yes." or other potential security risks.

The motivation here is that both legitimate and malicious developers will continue to attempt to fingerprint browsers and devices using whatever crumbs of comparative data they can scrape. If you give them a legitimate API to access, statistically many of them will be lazy and not continue to try data gathering via alternative methods. They'll risk a user saying "no" to tracking over having to write an entire class to accomplish a task that could be done by getting the return value of a single function. It should also lower total user interaction in the long run, with a "Remember me" checkbox and an "Always remember my device" button being the only two clicks they'll need to log in to a given website for the rest of that device's lifetime.

I know this is a highly controversial topic for web browsers, but I'd like some more focused thought here:
are there major risks to this concept?
what could go wrong with the system itself?
how difficult would it be to create this API?
what could go wrong with the value determination algorithms?
could this be accomplished on every device and OS?
would OS security features block access to relevant details without admin rights?

and the biggest questions:
how would users respond to knowledge of this feature?
how would users respond to the prompts?
would users even understand what they're allowing or blocking?

Again, I'm not asking Pale Moon to include this as a feature out of nowhere - I'm asking if this is a feature worth considering for the entire internet ecosystem.

Unread post by **Moonchild** » 2020-08-08, 18:41

Moved to dev discussion since it's not an actual feature request.

IMO there are a lot of pitfalls here, but I'm sure others with more time can educate you on the matter.

Unread post by **vannilla** » 2020-08-08, 20:01

I can see the usefulness of this feature, but I can also see the same end push notifications had: websites will simply spam a lot of requests (actually coming from third-party scripts) to the user, who will install additional software (or request browser developers to add a way) to stop the barrage of prompts.
I think in dedicated applications (i.e. not the general web) it could have a future, but you'd first need to find the right market for it.

athenian200 · Unread post by **athenian200** » 2020-08-08, 22:09

The problem is privacy advocates would never tolerate this in any form whatsoever and would call UXP spyware if we implemented such a feature.

If you look at the fine details, even the large companies that are sending telemetry all over the place do, in fact, try their best to anonymize the data to limit their liability. But when you see articles on the subject, you get the impression that actual employees of these companies are looking at your data and sending it to advertisers who also look at your data, and that they know everything about you. In reality, they are looking at anonymized IDs much like the ones you're proposing that are associated with a profile.

However, that's not the discussion that comes up. Privacy advocates sound the alarm and blow the situation somewhat out of proportion in order to get people talking about it. If you have a discussion with a more intelligent/technical advocate, they would say that they don't think users should put their faith in such safeguards and thus they don't choose to acknowledge a difference between anonymized data strings like you're proposing, and a computer sending your private data as clear text to a server somewhere.

But the real discussion when it comes down to nuts and bolts is more of a debate between companies that actually have a lot of data privacy safeguards in place (because they know they're under a lot of scrutiny) while still allowing some degree of tracking for convenience and advertising purposes, and watchdogs who will point out how safeguards have failed in the past, the fact that it requires a degree of trust, and why no safeguard of any kind a company puts in place to prevent your info being leaked should ever be trusted.

Anyway, the point is... if we implemented something like this unilaterally, we would be panned as being just like the big tech companies, and people would cite times when big tech companies talked about their use of anonymized data and say it's the same kind of BS. If it became a web standard proposed by the W3C and we implemented it because websites refused to work without it, it might be different because then it would be up to the W3C to defend the standard they drafted, and the implementer could just say they're following the specs.

I mean, it's true that it's not as risk-free or perfect as Google or Microsoft would have you believe, but it's also not the dark picture the EFF and other privacy watchdogs are painting. The more you look into it, the more you see how complicated it is to meet the demands of the various parties and the more you understand where both sides are coming from, to the point that you start thinking in terms of specific contexts, people's needs and expectations, and overall project requirements rather than in philosophical or ideological terms, because you understand that everything is a compromise and no solution will satisfy all parties.

Unread post by **RealityRipple** » 2020-08-08, 22:38

athenian200 wrote: ↑
2020-08-08, 22:09
everything is a compromise and no solution will satisfy all parties.

That's exactly where settings and preferences come into play, though, isn't it? By providing a single path for a problem and user-selection to control that path, you ensure that nobody needs to compromise to a single solution; each user can choose their own preferred solution. I agree entirely, though, implementing this concept into Pale Moon or any other minor-contender browser would be suicide, that's not what I'm suggesting. I'm talking about a fully standardized spec, either by the consortium or the working group. Otherwise, no services would bother to use it anyway. The question is, is it a better or worse solution than the present situation or any other practical methods to tackle fingerprinting?

Unread post by **RealityRipple** » 2020-08-08, 22:43

vannilla wrote: ↑
2020-08-08, 20:01
I can see the usefulness of this feature, but I can also see the same end push notifications had: websites will simply spam a lot of requests (actually coming from third-party scripts) to the user, who will install additional software (or request browser developers to add a way) to stop the barrage of prompts.
I think in dedicated applications (i.e. not the general web) it could have a future, but you'd first need to find the right market for it.

Actually, on that front, it seems like there should be a couple changes to notification functionality, which should pass over to geolocation and this proposal: same-origin only requests, a single "deny" permanently blocking all future requests from the domain and any subdomains unless removed from a blacklist by hand, and an out-of-box experience for the notifications so people actually know what they're clicking on when they say "accept". That last one is definitely the hardest to accomplish. Users ignore anything that isn't their immediate task. One has to wonder if it'd be better to start off with all these prompted-data-request features disabled until the user reads a blurb about the technology and chooses to enable it for the browser.

Unread post by **vannilla** » 2020-08-08, 23:13

Thinking about it though, stuff like this has been attempted already, with projects like OpenID Connect, re:claim or external devices like YubiKey.
I can't vouch for their effectiveness since I never used them, but I've read up a bit on how they work and they also have the property of existing in one form or another.

Unread post by **Isengrim** » 2020-08-08, 23:40

I think something like this would end up being used like a user agent (and in fact, the user agent is already used for things like this). Bad actor clients would spoof or blank it, which if we're just talking convenience wouldn't matter much. Bad actor servers would request this information for tracking purposes that the privacy nuts are so afraid of.

I also think these IDs would have to be profile specific, since (at least in UXP) multiple users could use the same system, and you would not want sites to remember a different user than who they're supposed to remember. The device ID should probably be the same across all profiles, if you're using a hash of hardware information that is repeatable. Also, I think lanID would be hard to implement - if you're using specific IP addresses, the user's local device address could be DCHP and could change at any time depending on network settings. If you're using a range, the user could bring their laptop from one home network to another, where both addresses could be in the same range of IPs, but they would obviously be different LANs.

In truth, I think this is ultimately trying to solve what is really a server-side issue for services offering "remember me" functionality. These services should just issue a unique ID and be done with it. Asking users to log in from a new browser or device, or after clearing cookies is not really that unreasonable since it should not be a frequent occurrence. (In theory, users can also copy cookies from one device to another, thereby carrying their credentials over with them.) Trying to use other information about the user, such as the IP, UA, headers or anything else that the client could change at will is not really an accurate way of telling if they are the same user every time, since clients can spoof or change these values without warning.

Unread post by **Admin** » 2020-08-08, 23:56

One thing y'all missed is that trackers won't be satisfied with replacing what they have with something offered by user choice/permission. It will be used -in addition- to the tracking/fingerprinting already in place. So effectively all this would do would be offer the illusion of choice. If a user declines to share the unique ID, the tracker will simply continue to use what is already in place to ID the user.

Unread post by **Moonchild** » 2020-08-09, 11:28

RealityRipple wrote: ↑
2020-08-08, 17:09
how would users respond to knowledge of this feature?

With outrage, most likely. They'd come up with articles titled "X - The fingerprinting browser" "In bed with advertisers/trackers?" etc.

RealityRipple wrote: ↑
2020-08-08, 17:09
how would users respond to the prompts?

They would find ways to disable them.
Also, the prompting itself would be an issue: Website A uses resources from providers B, C, and D. Are you going to prompt the user for all of them? For just A, allowing B, C, and D silent access? Or always block B, C, and D which might break things?

RealityRipple wrote: ↑
2020-08-08, 17:09
would users even understand what they're allowing or blocking?

No, they likely wouldn't. But in essence it doesn't really matter if they do, because this kind of system only works in trusted environments that rely on single providers. The Internet is not that. You'd just be creating an easy way for additional fingerprinting, something that can't be circumvented or faked by the user aside from blocking.

I don't think the convenience will ever be worth the drawbacks of this.

Unread post by **adesh** » 2020-08-10, 09:43

This is already implemented in phones. Think about device ID Android uses and then the IMEI. I think those are also covered by permissions.

This is totally unnecessary for the web at least. If all you are trying solve is remember me problem, that can be solved by a simple text change on the website.

The biggest problem is the same ID will be served to all the websites, which is a privacy concern and a centralized server could track and associate a user's activity using the identifier. Also, if I share my system with someone, websites would know that we both are using the same machine; this is extra information which I wouldn't want to divulge. Any secure (and private) system should not transmit any more information than is needed.

Finally, like others said, there is convenience to the developers here but this can be easily solved by using server side identifiers better considering the pitfalls and concerns.

Unread post by **Moonchild** » 2020-08-10, 11:55

I've given this some more thought and actually, there is completely no reason for this to be a thing "by design".
Any login system can store a "secret" in a secure cookie and matching that to the user database (having the same key) to see if it is "the same user". This doesn't have to be a cross-site thing and rather should not be (because tracking). Cookies are subject to same-origin policies so can't be read anywhere else. So we already have everything in place that's needed, it just required server implementations to create their own unique ID for a visiting user (which is easy). Uniquely identifying devices and browser instances/installations is not going to add anything to the table that will benefit login solutions, and in fact can only be used to be -stricter- about logins, not more lenient.

Pale Moon forum

Device Fingerprinting - The Good, The Bad, and The Ugly

Device Fingerprinting - The Good, The Bad, and The Ugly

Re: Device Fingerprinting - The Good, The Bad, and The Ugly

Re: Device Fingerprinting - The Good, The Bad, and The Ugly

Re: Device Fingerprinting - The Good, The Bad, and The Ugly

Re: Device Fingerprinting - The Good, The Bad, and The Ugly

Re: Device Fingerprinting - The Good, The Bad, and The Ugly

Re: Device Fingerprinting - The Good, The Bad, and The Ugly

Re: Device Fingerprinting - The Good, The Bad, and The Ugly

Re: Device Fingerprinting - The Good, The Bad, and The Ugly

Re: Device Fingerprinting - The Good, The Bad, and The Ugly

Re: Device Fingerprinting - The Good, The Bad, and The Ugly

Re: Device Fingerprinting - The Good, The Bad, and The Ugly