3 ways to make polling great again

Jian Huang

CS Professor and Inventor - making insights impactful, reproducible and accessible

Published Nov 10, 2016

Ahead of the 2016 U.S. Presidential election, polling and the voter projection industry predicted Hillary Clinton as the clear favorite to win over Donald Trump. They failed spectacularly. Trying to determine why the pollsters missed the mark so badly has been a common theme among the post-mortem election reports (Trump’s Win Isn’t the Death of Data—It Was Flawed All Along, Wired).

Clearly the methods and models used to predict the voting outcome failed to deliver an accurate forecast. Another article from The Hill stated it’s an industry-shattering embarrassment, which indicates that the polling industry is facing an inevitable disruption in order to regain trust.

So what went wrong and what should the polling industry do about it?

1. Start using implicit process data. Pollsters rely too heavily on explicit data - literally the same old exam style Q&A surveys. Even common sense says “what you say is important, how you say it is even more important”. Explicit data is easy to collect but known to be plagued with quality issues. However, decades have passed and the same problematic methods for collecting poll data are still in use. Today’s technology enables something much better. Today’s technology makes it possible to collect better explicit data together with implicit process data at scale. The explicit data shows what people are saying. The implicit process data shows how they are saying it.

2. Do more than tallying, use real data analytics. Political polling is probably the oldest segment under the broad umbrella of market research. There are many established players in this field. Over time, they have become set in their ways of doing things. “Margin of error” became a universal quality metric, many things that should have mattered, stopped mattering. For example, tallying should have been the start of the analysis, but too often became the end as well. Predicting requires more analysis. Diagnosis requires more analysis. Developing action plans require more analysis too. Tallying alone won’t tell you the why and what to do! Without knowing this key information, you can’t and shouldn’t make predictions. As Cade Metz at Wired summarized, "this wasn’t so much a failure of the data as it was a failure of the people using the data."

3. Build profiles using people’s priorities. The practice of using demographic information as meaningful “labels” is not only misleading to the decision makers and the general public, but also problematic in other ways. For example, you must recognize the fact a person’s past experiences uniquely define their identities and affect their joys, wishes, worries, pains, and their causes in tangible and intangible ways. Every person is unique. No one is the same.

Relying only on convenient demographic labels hides these profiles. What a fatal mistake! “White college educated” means very little. “Millennials” means very little. Pick any of these segments and you will find they are way more diverse than expected. The action plans aiming at those labels seem to be specific and directed, but they miss the true targets.

By misusing the primary dimensions of segment voter data, what should have stood out as clear trends were instead interpreted by pollsters as noise.

What they care about the most, and what drives their intention to take action - these are the two questions pollsters should have used as the primary dimensions to understand voters. In this context, demographic information ought to be just a natural outcome at the end. By misusing the primary dimensions of segment voter data, what should have stood out as clear trends were instead interpreted by pollsters as noise. They missed, and lost.

There are a few significant reasons as to why the polling industry needs to address these three fundamental changes right away if they want to have any chance of regaining trust.

First, Internet and digital gadgets have transformed our world. The identity of today’s media and communication has transitioned (rightly or wrongly, justified or unjustified) from being a source of facts to a source of opinions. The short spurted, always-on but always-multitasking day-to-day actions have made individuals less interested in exchanging information or trying to understand another person’s perspective. The quick and easy thing to do is just to share my feelings and pass on my opinions - disguised as being more aware of diversity, people judge people more than before. These factors together are magnifying the nuanced but deep relationships between injunctive norms and descriptive norms. That’s a fancy way of saying - people today are more aware of “what they should say” than ever before. This causes, and will exacerbate, data quality issues if you use only the explicit data. “Undercover voters” may be a convenient term used today that will fall out of the news cycle soon, but there are deep reasons for that.

The valuable things to a campaign have to include: why things are happening this way (diagnostic), what we expect to see next week (predictive), and what options we have and how to approach each option (prescriptive).

Second, data analytics goes way beyond tallying. Among descriptive, diagnostic, predictive and prescriptive uses of data analytics, descriptive is the first step. It tells you what is happening. The national frenzy with “margin of error” in polls is in part because the polling industry limited itself to just “descriptive analytics”. That’s the problem, because in such a contentious election, the moment a poll result is in the news, it’s history, old, in the past. The valuable things to a campaign have to include: why things are happening this way (diagnostic), what we expect to see next week (predictive), and what options we have and how to approach each option (prescriptive). As many modern businesses use more data analytics by day, why is the polling industry still fixated on “margin of error” and stuck in the age of using “pen & paper” style of questionnaires?

Furthermore, and specific in this election, “margin of error” not only lacked analytical value, it wonderfully misled everyone. It’s false precision, not informed accuracy. While people may have differing opinions, very sadly, the atmosphere of this election has made them share the same idea of “what I am not supposed to say.” As a result, their intentions are different but their explicit answers can be the same. That’s where tallying things up and reporting the margin of error gave everyone a false picture of where things stand. Ditch tallying and questionnaires. Use a real feedback analytics platform that can get to the implicit process data and infer a deeper level of feelings from people. For example, as one of the vendors in this market niche, Survature has helped Nascar races to know more about their fan base (in realtime) than our whole nation knew about the voters in this presidential election; and all that was done with just one person using a piece of software. It is the methodology that has led to better data, which in the end made the difference.

Third, profiling voter segments based on the wrong dimensions leads to wasting resources and missing opportunities. The dimensions have to be the issues that matter most to people. Knowing that, you will be able to find better ways to reach people, develop better marketing messages, and best yet, truly deliver and serve the right set of needs for each crowd of voters. $80 million can get spent quickly on TV ads, but if you have better information about your audience those ads can make a more substantial impact.

Dr. Jian Huang is a co-founder and CEO of Survature, a revolutionary feedback collection and analytics platform that captures and analyzes implicit process data from survey taker behavior. For the first time ever, decision makers can see what respondents say, how they say it, and what matters most to them (in a matter of minutes). Visit survature.com to schedule a demo.

Tim McCarthy

Helping people to find their data and use it to drive measurable value in their businesses

Excellent article. Side note: I heard on NPR last night that most of the polls still obtain their samples by calling landline phone numbers to survey people. I found this shocking and so went digging and then read that the pollsters weight the 18-30 year old age segment more heavily to account for this. Here's the thing, though; everyone is dropping their landlines nowadays, even my septuagenarian parents (Trump supporters). I don't know if pollsters are truly accounting for all the different ways that such respondents differ from the general population (it's not just age).

3 Reactions

Jim Bryson

Commissioner, Finance & Administration

Nice article Jian.

1 Reaction

Donald W.

Workplace Automation and Digital Transition Professional

I had some eventual Trump supporters say that Trump was essentially a populist (but not the Populist Party of yesteryear) and supporters were not likely to respond to polls. The "silent majority" were basically also populists in this sense, and I think Stephen's comment really expounds on this. For the Trump supporters I knew, political correctness as well as economic issues (real estate namely) were big, and they'd rather have someone who is not PC and understands, at least, real estate well enough to rectify those economic issues. At the very least, it makes for a great presentation or consulting session about what can happen when over-relying on the same factors when doing a predictive analysis.

1 Reaction

Stephen Daniel

President @ Daniel Research Group | Market Research Expert

This morning (Wed 11/9) I received this email from my girlfriend, a staunch Trump supporter. “Tell me, I would like to understand how can the polls all over be so wrong? Maybe a lot of hidden Trump voters? My daughter is not happy also like you But America has spoken!” I decided to take a day off from technology market research and try to answer her. Note that this reply is to someone with no training in statistics or polling methodologies. Why Were the Polls So Wrong? There are two primary reasons why the polls spectacularly failed get to even close to accurately predicting the outcome. First - what the pollsters felt was the correct way to segment the population, and second, relying on self-reporting. The idea behind polling is to ask a small (relative to the total population) number of people who they are going to vote for, and then apply the percents to the total population. This works if the sample matches the population proportionally for the demographic variables that are most important in influencing the future decision. If 85% of the sample is male then this is not a good sample to use to directly project to the total population. In practice, weights are used to correct for differences between the sample demographic distributions and the population distributions. The problem and the cause of the polling mistake in this case, are in the choice of which demographic variables to use. The model that was applied over weighted gender and race, and, as far as I know, exclude the most important variable, child-rearing phase. The sample design also over-weighted self-interest at the expense of group interest. The second flaw in the process was a failure to recognize that self-reporting, always a high variance risk, was further compromised because of the relationship between the subject and the observer. For many subjects, asking whom you are going to vote for is akin to asking them how often they kick their dog. Because the media was active in linking Trump to perceived negative behavior, and the pollsters are perceived to be linked the media, under-reporting intention to vote for Trump was inevitable, and should have been accounted for. Why Did Trump Win? Over the course of designing and executing hundreds of consumer market research studies, I observed that the single most import factor influencing future household and personal purchasing decisions was the Child-Rearing Phase, Pre-, Early-, Late- and Post. Why should voting behavior differ? There was an inherent assumption made by the pollsters, and many political strategists that personal self-interest would trump (not intentional) group self-interest, specifically the household. This has turned out to be hugely (again, not intentional) wrong. Women were faced with the choice of voting for Clinton based on gender and political correctness issues, and or Trump based on the promise of increased economic and physical security. More women chose increase economic and physical security than anticipated. Similarly, more African-Americans had to choose between a candidate who offered more personal safety (less “law and order”) or a candidate who offered more family economic security – jobs. Group identity will more often prevail over individual identity, and in this case, genes beat memes. Steve Daniel. President, Daniel Research Group www.danielresearchgroup.com

3 ways to make polling great again

Jian Huang

CS Professor and Inventor - making insights impactful, reproducible and accessible

More articles by this author

Insights from the community

Others also viewed

Through the Looking Glass

Insights with Power BI in India's 2024 Elections

What RevOps can learn from election politics

Predicting Elections and Human Behavior: Stop Asking and Start Observing

Analyzing BJP's Remarkable Turnaround: From 0-4 Predictions to a 3-1 Victory

Aggregate statistics can sometimes mask important information.

The Wizardry Behind Election Night Forecasting

Ahhhhhh...Election Season – The Most Blessed Time of the Year for Data Visualization

Popularity & Policy Don’t Matter In This Election— Data Analytics Will Decide The Next President

YouGov share price, hit movies and election results

Explore topics

Want to know what World Cup fans really think? Use these 3 lessons we’ve learned from running 50,000+ person surveys.

Jul 16, 2018

4 things that can make or break your data analytics presentation

Apr 3, 2018

Why is a data-enabled business better, and how do you build one?

Jan 17, 2017

Overwhelmed by Feedback Data? 4 Data Points May be All You Need.

Dec 20, 2016

Read Work Design Magazine and help transform the places we work

Dec 1, 2016

When did ‘innovation' become a 4-letter word?

Dec 1, 2016

Paralysis by Analysis is Killing Strategic Initiative

Nov 16, 2016

Building a strong data analytics pipeline. What should you outsource?

Oct 18, 2016

3 Ways Data Analytics Delivers Business Value

Oct 4, 2016

"Hatless-Hackers" are driving organizational change in surprising ways

Sep 13, 2016