They do but the browser has a limited cache size. And because of the gigantic size of even the smallest web sites these days, the cache is maxed every day, and your file are purged again and again. This is basically just a super cache of files you know you never want to invalidate. Also, it prevents OPTIONS and HEAD requests.
Are browsers really this stupid? Seems like an obvious strategy to have several cache buckets, with one dedicated to smaller assets with long expiry times.
It's not stupidity. There is really no way to know which file you want to keep longer than others without risking breaking things. CDN are actually a good, manually updated, source of listing of files that have this quality. But basing yourself on proprietary CDN are not a move any browser in their right might would do.
That it has been looked at at /any level/ (e.g. this plugin) is great, and it does so without waiting for a decade of W3C and browser standards back-and-forth.
If the server sends an "Expires" header in the response, then the client doesn't even need to do that check. With an expires header, the server has effectively told the client that the data wont change until at least a particular date, and so the client honours that information.
Last-Modified/If-Modified-Since is an optimisation trick which exists for the situation where the person running the website hasn't bothered to explicitly define expiry periods for content.
That depends on what type of caching headers the CDN uses. If it uses max-age and no etags/last-modified the browser won't send the if-modified-since request and just use the cached resource without asking the server.
This is brilliant, not only for privacy but for speed. Seeing this makes me wonder why I didn't build this yet. I've often thought that Javascript loading tags could include a hash of the desired resource and your browser can fetch them once only for a thousand page loads on a thousand websites. This is not that, but it is extra local caching and on top of that it stops most tracking by CDNs. Guess I always thought of it as something my browser should have instead of an addon.
IPFS does this by design. Everything is content-addressed so you immediately know if you've seen the resource before. This also enables chunk-level deduplication.
If course the P2P nature of the project means other people can find out exactly which of those resources you're looking at...
Only your direct peers can, and they can't tell if you got the content to increase your fitness score or because you wanted the asset for yourself. Peers are incentivized to pull as many assets as they can (which prevents torrent death) in order to build reputation.
I think this is overly optimistic; as soon as you pull down a rare asset you have leaked information, since a peer that's farming would presumably work down the list of assets ranked by some measure of popularity, and would be unlikely to bother collecting obscure content.
This sort of system helps against some sorts of snooping, but certainly not nation-state adversaries.
> This is brilliant, not only for privacy but for speed.
But these resources are probably already cached by the browser anyway (using the appropriate http headers). So how can this solution add any improvements to that, once the resources have been loaded for the first time?
The libraries can't be that big, surely it would work fairly well if you just dedicate a preset portion of hard drive space, and delete the least used when it exceeds that size.
Actually now using this and seeing the results, I think Chrome caches assets much like this extension does. It gives a huge perceptive speed boost to Firefox. Mozilla devs need to look at including this in Firefox by default. It's huge in terms of speed and Firefox's competitiveness with Chrome.
I believe it still sends a network request to check the status of the resource. But an extension can bypass this and assume that the asset has not changed.
Indeed. Or check it every x hours in the background. Or the developer keeps up with jQuery news (and a few other big ones) and pushes updates. Or developers can push updates themselves. Many simple solutions that give a speed boost on many sites already.
Clean links - removes redirects from search engines, facebook, twitter etc to hide a fact that you clicked a link, google doesn't know what link you clicked from search results, so if you block GA, it can't track you https://addons.mozilla.org/en-US/firefox/addon/clean-links/
I have used Noscript, AdBlockPlus and Ghostery before but found they where lacking functionality, flexibility and performance.
I used Privacy Badger too but if I remember well, it is based on the same engine as ABP and suffers for the same performance problems.
uMatrix provides a tabular view sorted by host featuring toggleable category columns e.g. Dis/Allow iframe, script etc. It's granular client side resource whitelisting.
uMatrix offers more granularity. You can choose exactly what each third-party site can do in terms of cookies, CSS, images, plugins, javascript, XHR(!), frames and media (audio, video, pdf, etc).
After years of using uMatrix (formerly HTTP Switchboard), many sites "just works" wrt Youtube, Vimeo and similar even without first-party javascript enabled.
I've considered sharing parts of my global ruleset so others can just copy-paste the sections/sites they want to whitelist without having to discover what's required themselves.
Considering the amount of variables that contribute to the browser fingerprint, you would be forced to conclude that the only way to prevent being so unique is to run a browser in a vanilla VM (although the OS is already a variable in itself).
I think this is a topic that gets discussed by (for example) the Firefox developers, but I get the feeling that this is one of the hardest problems to fix.
I would like to see a browser mode akin to the privacy mode most browsers feature that reduces the number of identifying variables (at the cost of features). So instead of telling the world that my time zone is CET and I prefer English (GB) as language, it would select a random time zone and locale (although this does inconveniently mean that sites might suddenly serve me content in Portuguese).
Come to think of it, TOR Browser probably does a couple of these things. Disabling Javascript is surely the biggest factor, although that does make the modern web pretty much unusable.
> Considering the amount of variables that contribute to the browser fingerprint, you would be forced to conclude that the only way to prevent being so unique is to run a browser in a vanilla VM (although the OS is already a variable in itself).
It'd have to be more like a VM running the OS with the highest market share (Windows), the browser with the highest market share (Internet Explorer), with the most common language used, with the most common time zone of users of the site you're accessing (varies by site and time of day), etc.
Anything else and you could stand out in the crowd. Using Linux or OS X, for example, really make fingerprinting easier for sites, which is quite disturbing.
Randomizing the values of certain attributes, as you've described, may help a lot if more people adopt it and make fingerprinting a futile exercise to those using it. :) If the people doing the fingerprinting see millions being successfully tracked with just a handful they're unable to track, they wouldn't even care. It's kinda like ad blocking. A few do it and it's not seen as a problem. If the majority does it, then the sites take notice. For a larger scale effect, browser makers should get into this. Mozilla, Apple, Microsoft and Google, in that order (with Opera somewhere in the middle), may be interested in thwarting browser fingerprinting.
A lot of factors that make up the total fingerprint have an influence on how sites react to your browser, so I would have it change per-session and per-domain to prevent weirdness.
Request filters: These add-ons filter requests to 3rd party hosts, effectively blocking everything (if set to default to deny all). Most sites, other than web applications or ecommerce, only need to connect to at most a single remote host to pull down their CSS files; the next most common requirement is Google Hosted Libraries.
* RequestPolicy: No longer developed, but still works for me
uMatrix if you are a control freak. I usually stand it for two weeks before giving up. And yup, I've trained it to know the sites I regularly visit. Thing is, I also surf NEW sites all the time.
Now I use Self Destructing Cookies, uBlock Origin and HTTPS Everywhere. That works just fine without taking the fun out of the web.
I've found NoScript a bigger part of my browsing habits lately. I wait for the moment when a site just goes haywire, maxing CPU cores, at which point I just nix its script privileges. This isn't nearly as disruptive as distrusting all websites.
There's really no need to use both Privacy Badger and Disconnect as they both do pretty much the same thing. I'd ditch them both and just use uBlock Origin with the "Privacy" filter lists enabled.
I thought Disconnect was based on preexisting lists and Privacy Badger automatically worked out which sites seemed to be setting cookies and using them for tracking across sites. I'll need to look into it more, thanks.
You're correct. I'm not sure where people get this idea that Privacy Badger's supposed "lists" are included in uBlock, but I've seen it around here a lot. Personally I use both.
CsFire is the result of academic research, available in the following publications: CsFire: Transparent client-side mitigation of malicious cross-domain requests (published at the International Symposium on Engineering Secure Software and Systems 2010) and Automatic and precise client-side protection against CSRF attacks (published at the European Symposium on Research in Computer Security 2011)
Firefox lets you do that natively too. Self destructing cookies deletes the cookies after you close the browser tab, not the entire window. If the browser doesn't provide complete isolation between tabs, this'll make it so the cookie isn't there to harvest from. You can set each site to tab/browser/never too.
There are duckduckGo privacy settings you can set. I'm not a big fan of a cloud store for your settings and thankfully duckduckGo allows for settings parameters in the url[0]. You can do things like require POST instead of GET, redirecting, forcing https etc. Once you get your search and privacy settings how you like them, take the resulting url and make an openSearch plugin out of it, manually or with something like my mycroft project[1]. Now any searches use your settings, no account/cloud settings needed. It's then easy to throw in every browser you use. Technically a plugin.
I wish their was a way to isolate third party cookies/html5 data to SiteVisited/ThirdPartySite instead of the current ThirdPartySite model. The third party site could track you within the site visited but you would appear as a different user when visiting a different site. There would be no way to track you across websites.
There's no need to use Disconnect anymore. Disconnect's list is now included in Firefox's native tracking protection feature (https://support.mozilla.org/en-US/kb/tracking-protection-pbm), and is also available through uBlock Origin subscriptions.
It wouldn't hurt to get SQLite Manager and periodically check what's in the browser databases. For example, if you buy anything online you might find your credit card number in there.
They claim to only sell your data if you opt-in to something called "GhostRank." [1]. It's proprietary software so there's no way to actually confirm that though.
There's really no reason for privacy conscious individuals to use Ghostery when uBlock Origin can do the exact same thing.
In addition to asking you a series of invasive questions when you uninstall — which, I suspect means they sell that data once you're no longer using the software.
Could someone please explain what this does in a clear step-by-step way please?
Here, I'll explain what I think it does so you can at least correct what I'm missing:
(1) User visits web site example.com and needs to get file foo.jpg from example.com.
(2) foo.jpg is available at some content delivery network, let's say Akamai.
(3) User's browser gets foo.jpg from Akamai.
(4) Akamai now knows the user's IP address, the Referer (example.com), and the user agent info (browser version, OS version, etc.)
So what does the Decentraleyes add-on do? I think it does the following:
First, this add-on apparently cuts out the Referer when the browser asks for foo.jpg, but Akamai would still get the IP address (and the user agent info unless the user is disguising that). With the IP address you've been tracked, so does this really help?
Second, this add-on apparently gives you a local copy of foo.jpg if it exists (i.e., a copy of foo.jpg already cached on your own computer). Well, the first copy of foo.jpg had to have come from somewhere (either example.com or Akamai), so you've already been tracked.
NOTE: I'm not criticizing the add-on at all! I'm just trying to understand it.
The extension has common files (jquery etc) requested included with it, and list of CDN curls that return those files. Every time browser makes request to those urls, the extension serves the local file back.
How does that help? It speeds up browsing, since you have local version of the requested file. It also increases privacy. For example many sites are lazy and use for example google cdn for let say jQuery. Now when you visit the site, google still can track you, because you make request to them.
The only weakness with this approach is that it only works for urls known to the author. Request to unknown CDN or even known CAN, but a new file will still be made (AFAIR there is an option to block unknown files on known CDN, but that will often break many websites)
I thought one of the benefits of using something like Google CDN to serve jQuery was meant to be that a person's browser was much more likely to have that in their cache than mylittlewebsite.com?
That's true, but you don't really have to let Google know which sites you visit in order to pull the jQuery library. This extension provides the files from a local cache so that you avoid the requests to a great extent and thus minimize any tracking. If mylittlewebsite.com and yourlittlewebsite.com both use Google CDN for jQuery, Google would know that you visited these two sites. With this extension, there's lesser chances of Google or another CDN getting all instances of the jQuery download requests (unless each site is using a different newer version of jQuery that's not locally cached yet).
Some of the things they mention, like the Google libs, are a CDN for a small, mostly-static set of stuff. So you could, say, hardcode most/all of the JavaScript libraries that Google hosts right into the plugin and never fetch them from Google.
With that said, I'm not 100% sure what it's doing for "normal" CDN files either. What you're saying sounds like a flaw, and I don't know if I don't see the obvious answer, or if you're right and that's a significant problem.
Ah! It looks like it also prevents additional "is this up to date?" requests for normal CDN resources. So you'll still load them on the first request, but you won't send a "hey, I'm using this" trackable request to the CDN after the first time.
That would be bad if the content changed, but in some cases you can be sure it won't.
its not focused on unique assets like images, its focused on stuff like Jquery, which uses the same file from the same CDNs across many websites. Many sites hook into a js script CDN and nothing else from those domains, so this saves being tracked by them.
The method this extension uses would require one to download the ~1GB font archive upon installation for that to work, since it installs all the blocked libraries in the extension itself. Perhaps it could block the 100 or so most commonly used google fonts and serve those...
The problem this solves is even worse than it sounds - there's no reason why the NSA couldn't force a CDN to silently concatenate their own analytics onto a site's JQuery. Is there a good way of signing your assets?
Can experienced with the code speak to how this is a 5MB addon? That smells like an order of magnitude more code than would be needed to block a fairly simple behaviour on fewer than 200 predefined URLs.
A version of Firefox with the modifications in the tor firefox bundle minus the tor network would be splendid. They fix privacy and solve fingerprinting as well. On top of that you don't have to deal with a website's decision to use certain fonts anymore. Is this available somewhere as a patchset/branch for mozilla-release.hg?
Does this work on localhost? Apart from the increased privacy, it sounds like this would also improve offline web development. I could be offline, and still have all the CDNJS libraries on the page load correctly.
The Tor Browser bundle clears cookies on exit. There are some plugins you can add to clear them on tab close too. As for something like CDN cookies, I'm actually not sure. But if you have cookies that are set by the CDN for their domain, then it's not trivial to link the loading page (assuming Referrer headers are stripped) to the resource being loaded because TBB uses different Tor circuits for different websites.
I think running any addons or plugins within Tor Browser is a bad idea. Even if it's from a "respected" source, the risk of it somehow becoming compromised is not worth it. IIRC the bundle even advises you that addons may be risky. Considering that the purpose of Tor is to remain anonymous, one should keep in mind that any addon could de-anonymize you.
> IIRC the bundle even advises you that addons may be risky.
The reason for this is that by installing various non-default addons, you're actually making your browser more unique. As a consequence, you're making it easier to link all of your Tor activity back to a single person.
Great. I add it. I wonder whether a fully free service is trustable or not.
Question: where do you get your info from? I'm trying to gather twitter lists into this repo to know the best sources of info.
Please collaborate: https://github.com/davidpelayo/twitter-tech-lists
Edit: someone asked how this works..
1) it looks up the resource in the mapping (linked above) (matching the cdn and file path).
2) if found, it replaces it with the copy it includes: https://github.com/Synzvato/decentraleyes/tree/master/data/r...
So for those files, requests are never made to the CDN.
If the website uses a different CDN; a lib not recognized; or a version not recognized.. then the request is still made.