Skip to content
This repository has been archived by the owner on Mar 16, 2023. It is now read-only.

How to we avoid FLoC replicating algorithmic red-lining and the discriminatory behavioral targeting we've seen in other ML systems? #16

Open
AramZS opened this issue Jul 14, 2020 · 17 comments

Comments

@AramZS
Copy link

AramZS commented Jul 14, 2020

One of the big concerns that has arisen from behavioral targeting of advertising is the replication of red-lining of communities done via print ads on to the web with ever more precision and impact. I wrote briefly about this issue here.

I am concerned that the machine learning approach will only continue to create this issue in web advertising.

For one, it does not discriminate between any particular content provider. Issue 12 discusses this in regard to some advertiser concerns, but I think there's a serious social impact that would degrade trust in this system, it would replicate the issues we see now that are causing advertisers to have to maintain large blocklists (resource intensive) or small allowlists (bad for new publishers trying to enter the space) and create targeting issues that I know Turtledove proposes to bridge.

However, this also allows for particular content providers to potentially impact users' ad views in a way that would create discriminatory targeting (ex: would a user who primarily reads The Root and The Baltimore Afro-American be categorized in a racially-specific group as opposed to an interest-based one?). It's unclear to me what measure of intervention is involved by the browser or, if they wish, the user in the application of these groups and if there is any built-in intervention against this type of well-known algorithmic discrimination and the inevitable after-effects. If we do have an issue where racially specific targeting is incidentally created by the ML system what happens when advertisers target for or against it and who ends up responsible? Is there a central controller who can create a correction? Do browsers become regulatory or legal targets if the targeting FLoC turns out to be used primarily for restricting job offers in a discriminatory way? Or housing offers?

There is obviously a complex issue and one on which there have been more technical examinations than I've got here, but I think to move forward on this proposal requires serious examination in to how we can be sure it does not replicate the potentially illegal discriminatory behavior that algorithmic targeting of user behaviors has created in the past.

@wqi1972
Copy link

wqi1972 commented Jul 24, 2020

Good thinking. Here is my understanding of the issue.

The ML algorithms assigning users to different FloC not suppose know anything about what each FloC stands for. (If it does, then it can't avoid ML algorithms be manipulated to favour certain advertiser or ad platform, even it is open sourced). It suppose treat each FloC neutrally, and shouldn't expose any information about why a user get into a FloC.

Then each ad platform can build its own algorithm to learn the behaviour of each FloC in their interest area and label them. Different ad platforms could view same FloC complete differently. (e.g., a shoe selling advertiser and a political campaign platform could mark the same FloC with completely different labels) I think this is the place where discrimination labelling could happen.

There suppose be some mechanism to avoid abuse of FloC by ad platform and have a way to intervene if it does happen.

@michaelkleber
Copy link
Collaborator

Hi Aram,

Check out the recent update describing the FLoC Proof-of-Concept experiment. In particular we plan to test the flocks we generate for excessive correlation with browsing activity in our existing list of sensitive categories, which Google Ads policies prohibit advertisers from using in personalized ads. (As we explain in the update, this analysis is designed to evaluate whether a flock may be sensitive, in the abstract, without learning why it is sensitive, i.e., without computing or otherwise inferring specific sensitive categories that may be associated with that flock.) Then we can block revealing those excessively-sensitive flocks, or come up with some other mitigation.

In addition to these protections, which relate to how flocks are generated, it will be important to think about acceptable and unacceptable uses of flocks by the parties requesting them from the browser. If you're building an ML model, then you need to think about the fairness of every input signal and of your training data. Models that use flock as a signal will require that kind of care too.

@AramZS
Copy link
Author

AramZS commented Feb 2, 2021

@michaelkleber Is there any data on how effective the attempt to recognize sensitive categories is elsewhere at Google? We would have to rely on Chrome to determine that a page is a sensitive category or that a FLoC is based in a sensitive category and I'm not sure I see in the proposal a clear way to do that.

@kuro68k
Copy link

kuro68k commented Mar 12, 2021

Reading Google's list of "sensitive categories" it's clear that it is inadequate. It is very US-centric and both doesn't go far enough (e.g. in protecting people from economic discrimination) and goes too far (e.g. the ban on birth control products).

While Google may feel this list is what they want, given this is going to be a standard other browser manufacturers will doubtless feel differently. Each browser could use its own list of cohorts, but then compatibility and usefulness to advertisers would be greatly reduced.

@michaelkleber
Copy link
Collaborator

We used Google Ads's preexisting list of sensitive categories to get started, and to leverage already-trained classifiers. Please contribute suggestions for other sources we might draw on.

@kuro68k
Copy link

kuro68k commented Mar 13, 2021

@michaelkleber I don't think any kind of list will work, it will always be culture specific and problematic for different groups of people.

This is an unsolved problem with targeted advertising.

@michaelkleber
Copy link
Collaborator

Sure — if you believe that it's impossible to do any targeting without discrimination, then FLoC is absolutely not going to solve this problem.

But a key lesson from the last ~10 years of ML Fairness research is that even if all the signals you feed into your ML-based ads selection system are fair and unbiased, the decisions of the system can perpetuate bias nonetheless. This is what I was getting at in my initial reply to Aram on this subject:

If you're building an ML model, then you need to think about the fairness of every input signal and of your training data. Models that use flock as a signal will require that kind of care too.

@kuro68k
Copy link

kuro68k commented Mar 13, 2021

So maybe you should reconsider this entire project, and focus your efforts on a) blocking all tracking and b) finding alternative ways for sites to be funded.

@michaelkleber
Copy link
Collaborator

@kuro68k Unfortunately, the best data we have suggests that if we limit ads to only contextual+1p and get rid of the types of targeting that take people's interests into account, most web sites lose 50-70% of their revenue. That's why Chrome's "Block all tracking" work is tied to efforts like FLoC, which aim to preserve the revenue even while preventing the tracking.

Feel free to propose some other way for sites to double or triple their cookieless revenue! But FLoC and FLEDGE are the best ideas we've seen so far.

@kuro68k
Copy link

kuro68k commented Mar 13, 2021

I appreciate it's a difficult problem, but remember that a 70% loss of revenue for a site has to be balanced against the very dire consequences to individuals of being tracked. It sounds very much like "we can't shut down the polluting toxic chemical plant that is making people sick, it would impact the company revenue."

Chrome is one of several browsers, and hopefully Chrome will allow FLoC to be blocked or poisoned with bogus data via add-ons in a way similar to current privacy enhancers. So it seems wise to consider how much adoption FLoC will see if a good proportion of browsers and users are actively working to subvert it.

Here's an idea for you. Instead of sending the user's cohort to the site, only allow the site to ask for an ad from an approved supplier to be displayed by the browser in a given space. The approved suppliers would be carefully vetted and located in the user's jurisdiction and cultural setting, and the site wouldn't need to get any indication of the user's interests. The ad gets displayed and the site gets paid.

@michaelkleber
Copy link
Collaborator

...balanced against the very dire consequences to individuals of being tracked.

Just for clarity, FLoC is indeed about not allowing users to be tracked, as is the rest of Chrome's Privacy Sandbox effort.

Here's an idea for you. Instead of sending the user's cohort to the site, only allow the site to ask for an ad from an approved supplier to be displayed by the browser in a given space...

If you wish to make new proposals for more private ways that the advertising industry could work, the W3C's Web Advertising Business Group may be an interested audience. The GitHub repo for one particular existing proposal probably isn't a useful place to design a second unrelated one, though.

(Also, re. "Here's an idea for you": this repository is part of the W3C's Web Platform Incubation Community Group, and contributors to these proposals need to have signed the W3C Community Contributor License Agreement. To contribute, please do join WICG and update your GitHub profile to indicate your name and organizational affiliation.)

@dczysz
Copy link

dczysz commented Apr 14, 2021

Feel free to propose some other way for sites to double or triple their cookieless revenue! But FLoC and FLEDGE are the best ideas we've seen so far.

I may not be smart enough to come up with a new better idea, but I'm at least smart enough to know that just because they're the best you've seen so far does not make them good ideas worth pursuing. It just makes them the best ideas you've seen so far, nothing else.

@michaelkleber
Copy link
Collaborator

@dczysz The idea we're pursuing is getting rid of 3rd-party cookies and replacing them with things that are much better for privacy. I absolutely believe that is a worthwhile intermediate step, even if someone smarter comes up with even better ideas in the future.

@dczysz
Copy link

dczysz commented Apr 14, 2021

I agree 100% with the goal of getting rid of 3rd party cookies. Although FLoC may be "much better for privacy", it still is nowhere remotely close to being "good for privacy". I fail to understand how one can honestly state that any tracking method is good or better for privacy.

I don't think we'll ever agree on this though. Google and regular users have very different priorities and that will never change as long as the business model relies on generating ad revenue by tracking users.

@kuro68k
Copy link

kuro68k commented Apr 14, 2021

@michaelkleber why not just get rid of third party cookies and replace them with nothing? The industry will have to adapt to inferring interest just from the site being visited, and it's the best thing for privacy.

The web is headed that way anyway. Many browsers won't implement FLoC, and many users will block it. The solution is not technical, it's a business issue where advertisers simply need to accept that the days of knowing people's browsing history are over and they must adapt.

@michaelkleber
Copy link
Collaborator

Unfortunately, if we just get rid of the cookies without adding FLoC / FLEDGE (or something else to support some key advertising use cases), then the best data indicates that most of the web sites in the world would lose something like 50%-70% of their revenue.

I'm quite on board with advertisers needing to "accept that the days of knowing people's browsing history are over and they must adapt." These efforts are all about how to let dollars still flow from advertisers to publishers even without anyone learning your browsing history.

@kuro68k
Copy link

kuro68k commented Apr 15, 2021

If 50-70% of their revenue is generated by violating people's privacy then I'm not sure there is a case for allowing them to continue to do that, even in a reduced form. You have set the bar at a level of privacy invasion that you feel is acceptable, but many others feel it is too high which will make adoption difficult.

GDPR and similar laws may also require FLoC to be opt-in for many users, and I expect many will choose not to accept it given that the benefit to them is extremely dubious and most independent advice out there is that it's a bad idea.

Maybe a better use of your time and energy would be to find ways to generate more revenue without privacy invasion. Google already offers an option to disable targeted advertising on its platform and Chrome even blocks the most abusive ads by default now. I'm sure the industry predicted dire loss of revenue when those measures were introduced too.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
  翻译: