Welcome!

@BigDataExpo Authors: Yeshim Deniz, Christoph Schell, Liz McMillan, Elizabeth White, Matt Brickey

Related Topics: @BigDataExpo, Agile Computing, @CloudExpo

@BigDataExpo: Blog Post

Data Analysis Is Changing the Face of Political Campaigning | @BigDataExpo #BI #BigData #Analytics

2016 election campaigners look to Big Data analysis to gain an edge in intelligently reaching voters

The next BriefingsDirect Voice of the Customer digital transformation case study explores how data-analysis services startup BlueLabs in Washington, DC helps presidential election campaigns better know and engage with potential voters.

We'll learn how BlueLabs relies on high-performing analytics platforms that allow a democratization of querying, of opening the value of vast data resources to discretely identify more of those in the need to know.

Here to describe how big data is being used creatively by contemporary political organizations for two-way voter engagement, we're joined by Erek Dyskant Co-Founder and Vice President of Impact at BlueLabs Analytics in Washington. The discussion is moderated by BriefingsDirect's Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Obviously, this is a busy season for the analytics people who are focused on politics and campaigns. What are some of the trends that are different in 2016 from just four years ago. It’s a fast-changing technology set, it's also a fast-changing methodology. And of course, the trends about how voters think, react, use social, and engage are also dynamic. So what's different this cycle?

Dyskant: From a voter-engagement perspective, in 2012, we could reach most of our voters online through a relatively small set of social media channels -- Facebook, Twitter, and a little bit on the Instagram side. Moving into 2016, we see a fragmentation of the online and offline media consumption landscape and many more folks moving toward purpose-built social media platforms.

If I'm at the HPE Conference and I want my colleagues back in D.C. to see what I'm seeing, then maybe I'll use Periscope, maybe Facebook Live, but probably Periscope. If I see something that I think one of my friends will think is really funny, I'll send that to them on Snapchat.

Where political campaigns have traditionally broadcast messages out through the news-feed style social-media strategies, now we need to consider how it is that one-to-one social media is acting as a force multiplier for our events and for the ideas of our candidates, filtered through our campaign’s champions.

Gardner: So, perhaps a way to look at that is that you're no longer focused on precincts physically and you're no longer able to use broadcast through social media. It’s much more of an influence within communities and identifying those communities in a new way through these apps, perhaps more than platforms.

Social media

Dyskant: That's exactly right. Campaigns have always organized voters at the door and on the phone. Now, we think of one more way. If you want to be a champion for a candidate, you can be a champion by knocking on doors for us, by making phone calls, or by making phone calls through online platforms.

You can also use one-to-one social media channels to let your friends know why the election matters so much to you and why they should turn out and vote, or vote for the issues that really matter to you.

Gardner: So, we're talking about retail campaigning, but it's a bit more virtual. What’s interesting though is that you can get a lot more data through the interaction than you might if you were physically knocking on someone's door.

Dyskant: The data is different. We're starting to see a shift from demographic targeting. In 2000, we were targeting on precincts. A little bit later, we were targeting on combinations of demographics, on soccer moms, on single women, on single men, on rural, urban, or suburban communities separately.

Dyskant

Moving to 2012, we've looked at everything that we knew about a person and built individual-level predictive models, so that we knew each person's individual set of characteristics made that person more or less likely to be someone that our candidate would have an engaging conversation through a volunteer.

Now, what we're starting to see is behavioral characteristics trumping demographic or even consumer data. You can put whiskey drinkers in your model, you can put cat owners in your model, but isn't it a lot more interesting to put in your model that fact that this person has an online profile on our website and this is their clickstream? Isn't it much more interesting to put into a model that this person is likely to consume media via TV, is likely to be a cord-cutter, is likely to be a social media trendsetter, is likely to view multiple channels, or to use both Facebook and media on TV?

That lets us have a really broad reach or really broad set of interested voters, rather than just creating an echo chamber where we're talking to the same voters across different platforms.

Gardner: So, over time, the analytics tools have gone from semi-blunt instruments to much more precise, and you're also able to better target what you think would be the right voter for you to get the right message out to.

One of the things you mentioned that struck me is the word "predictive." I suppose I think of campaigning as looking to influence people, and that polling then tries to predict what will happen as a result. Is there somewhat less daylight between these two than I am thinking, that being predictive and campaigning are much more closely associated, and how would that work?

Predictive modeling

Dyskant: When I think of predictive modeling, what I think of is predicting something that the campaign doesn't know. That may be something that will happen in the future or it may be something that already exists today, but that we don't have an observation for it.

In the case of the role of polling, what I really see about that is understanding what issues matter the most to voters and how it is that we can craft messages that resonate with those issues. When I think of predictive analytics, I think of how is it that we allocate our resources to persuade and activate voters.

Over the course of elections, what we've seen is an exponential trajectory of the amount of data that is considered by predictive models. Even more important than that is an exponential set of the use cases of models. Today, we see every time a predictive model is used, it’s used in a million and one ways, whereas in 2012 it might have been used in 50, 20, or 100 sessions about each voter contract.

Gardner: It’s a fascinating use case to see how analytics and data can be brought to bear on the democratic process and to help you get messages out, probably in a way that's better received by the voter or the prospective voter, like in a retail or commercial environment. You don’t want to hear things that aren’t relevant to you, and when people do make an effort to provide you with information that's useful or that helps you make a decision, you benefit and you respect and even admire and enjoy it.

Dyskant: What I really want is for the voter experience to be as transparent and easy as possible, that campaigns reach out to me around the same time that I'm seeking information about who I'm going to vote for in November. I know who I'm voting for in 2016, but in some local actions, I may not have made that decision yet. So, I want a steady stream of information to be reaching voters, as they're in those key decision points, with messaging that really is relevant to their lives.

I want a steady stream of information to be reaching voters, as they're in those key decision points, with messaging that really is relevant to their lives.

I also want to listen to what voters tell me. If a voter has a conversation with a volunteer at the door, that should inform future communications. If somebody has told me that they're definitely voting for the candidate, then the next conversation should be different from someone who says, "I work in energy. I really want to know more about the Secretary’s energy policies."

Gardner: Just as if a salesperson is engaging with process, they use customer relationship management (CRM), and that data is captured, analyzed, and shared. That becomes a much better process for both the buyer and the seller. It's the same thing in a campaign, right? The better information you have, the more likely you're going to be able to serve that user, that voter.

Dyskant: There definitely are parallels to marketing, and that’s how we at BlueLabs decided to found the company and work across industries. We work with Fortune 100 retail organizations that are interested in how, once someone buys one item, we can bring them back into the store to buy the follow-on item or maybe to buy the follow-on item through that same store’s online portal. How it is that we can provide relevant messaging as users engage in complex processes online? All those things are driven from our lessons in politics.

Politics is fundamentally different from retail, though. It's a civic decision, rather than an individual-level decision. I always want to be mindful that I have a duty to voters to provide extremely relevant information to them, so that they can be engaged in the civic decision that they need to make.

Gardner: Suffice it to say that good quality comparison shopping is still good quality comparison decision-making.

Dyskant: Yes, I would agree with you.

Relevant and speedy

Gardner: Now that we've established how really relevant, important, and powerful this type of analysis can be in the context of the 2016 campaign, I'd like to learn more about how you go about getting that analysis and making it relevant and speedy across large variety of data sets and content sets. But first, let’s hear more about BlueLabs. Tell me about your company, how it started, why you started it, maybe a bit about yourself as well.

Dyskant: Of the four of us who started BlueLabs, some of us met in the 2008 elections and some of us met during the 2010 midterms working at the Democratic National Committee (DNC). Throughout that pre-2012 experience, we had the opportunity as practitioners to try a lot of things, sometimes just once or twice, sometimes things that we operationalized within those cycles.

Jumping forward to 2012 we had the opportunity to scale all that research and development to say that we did this one thing that was a different way of building models, and it worked for in this congressional array. We decided to make this three people’s full-time jobs and scale that up.

Moving past 2012, we got to build potentially one of the fastest-growing startups, one of the most data-driven organizations, and we knew that we built a special team. We wanted to continue working together with ourselves and the folks who we worked with and who made all this possible. We also wanted to apply the same types of techniques to other areas of social impact and other areas of commerce. This individual-level approach to identifying conversations is something that we found unique in the marketplace. We wanted to expand on that.

Increasingly, what we're working on is this segmentation-of-media problem. It's this idea that some people watch only TV, and you can't ignore a TV. It has lots of eyeballs. Some people watch only digital and some people consume a mix of media. How is it that you can build media plans that are aware of people's cross-channel media preferences and reach the right audience with their preferred means of communications?

Gardner: That’s fascinating. You start with the rigors of the demands of a political campaign, but then you can apply in so many ways, answering the types of questions anticipating the type of questions that more verticals, more sectors, and charitable organizations would want to be involved with. That’s very cool.

Let’s go back to the data science. You have this vast pool of data. You have a snappy analytics platform to work with. But, one of the things that I am interested in is how you get more people whether it's in your organization or a campaign, like the Hillary Clinton campaign, or the DNC to then be able to utilize that data to get to these inferences, get to these insights that you want.

What is it that you look for and what is it that you've been able to do in that form of getting more people able to query and utilize the data?

Dyskant: Data science happens when individuals have direct access to ask complex questions of a large, gnarly, but well-integrated data set. If I have 30 terabytes of data across online contacts, off-line contacts, and maybe a sample of clickstream data, and I want to ask things like of all the people who went to my online platform and clicked the password reset because they couldn't remember their password, then never followed up with an e-mail, how many of them showed up at a retail location within the next five days? They tried to engage online, and it didn't work out for them. I want to know whether we're losing them or are they showing up in person.

That type of question maybe would make it into a business-intelligence (BI) report a few months from that, but people who are thinking about what we do every day, would say, "I wonder about this, turn it into a query, and say, "I think I found something." If we give these customers phone calls, maybe we can reset their passwords over the phone and reengage them.

Human intensive

That's just one tiny, micro example, which is why data science is truly a human-intensive exercise. You get 50-100 people working at an enterprise solving problems like that and what you ultimately get is a positive feedback loop of self-correcting systems. Every time there's a problem, somebody is thinking about how that problem is represented in the data. How do I quantify that. If it’s significant enough, then how is it that the organization can improve in this one specific area?

All that can be done with business logic is the interesting piece. You need very granular data that's accessible via query and you need reasonably fast query time, because you can’t ask questions like that when you're going to get coffee every time you run a query.

Layering predictive modeling allows you to understand the opportunity for impact if you fix that problem. That one hypothesis with those users who cannot reset their passwords is that maybe those users aren't that engaged in the first place. You fix their password but it doesn’t move the needle.

The other hypothesis is that it's people who are actively trying to engage with your server and are unsuccessful because of this one very specific barrier. If you have a model of user engagement at an individual level, you can say that these are really high-value users that are having this problem, or maybe they aren’t. So you take data science, align it with really smart individual-level business analysis, and what you get is an organization that continues to improve without having to have at an executive-decision level for each one of those things.

Gardner: So a great deal of inquiry experimentation, iterative improvement, and feedback loops can all come together very powerfully. I'm all for the data scientist full-employment movement, but we need to do more than have people have to go through data scientist to use, access, and develop these feedback insights. What is it about the SQL, natural language, or APIs? What is it that you like to see that allows for more people to be able to directly relate and engage with these powerful data sets?

It's taking that hypothesis that’s driven from personal stories, and being able to, through a relatively simple query, translate that into a database query, and find out if that hypothesis proves true at scale.

Dyskant: One of the things is the product management of data schemas. So whenever we build an analytics database for a large-scale organization I think a lot about an analyst who is 22, knows VLOOKUP, took some statistics classes in college, and has some personal stories about the industry that they're working in. They know, "My grandmother isn't a native English speaker, and this is how she would use this website."

So it's taking that hypothesis that’s driven from personal stories, and being able to, through a relatively simple query, translate that into a database query, and find out if that hypothesis proves true at scale.

Then, potentially take the result of that query, dump them into a statistical-analysis language, or use database analytics to answer that in a more robust way. What that means is that our schemas favor very wide schemas, because I want someone to be able to write a three-line SQL statement, no joins, that enters a business question that I wouldn't have thought to put in a report. So that’s the first line -- is analyst-friendly schemas that are accessed via SQL.

The next line is deep key performance indicators (KPIs). Once we step out of the analytics database, consumers drop into the wider organization that’s consuming data at a different level. I always want reporting to report on opportunity for impact, to report on whether we're reaching our most valuable customers, not how many customers are we reaching.

"Are we reaching our most valuable customers" is much more easily addressable; you just talk to different people. Whereas, when you ask, "Are we reaching enough customers," I don’t know how find out. I can go over to the sales team and yell at them to work harder, but ultimately, I want our reporting to facilitate smarter working, which means incorporating model scores and predictive analytics into our KPIs.

Getting to the core

Gardner: Let’s step back from the edge, where we engage the analysts, to the core, where we need to provide the ability for them to do what they want and which gets them those great results.

It seems to me that when you're dealing in a campaign cycle that is very spiky, you have a short period of time where there's a need for a tremendous amount of data, but that could quickly go down between cycles of an election, or in a retail environment, be very intensive leading up to a holiday season.

Do you therefore take advantage of the cloud models for your analytics that make a fit-for-purpose approach to data and analytics pay as you go? Tell us a little bit about your strategy for the data and the analytics engine.

Dyskant: All of our customers have a cyclical nature to them. I think that almost every business is cyclical, just some more than others. Horizontal scaling is incredibly important to us. It would be very difficult for us to do what we do without using a cloud model such as Amazon Web Services (AWS).

Also, one of the things that works well for us with HPE Vertica is the licensing model where we can add additional performance with only the cost of hardware or hardware provision through the cloud. That allows us to scale up our cost areas during the busy season. We'll sometimes even scale them back down during slower periods so that we can have those 150 analysts asking their own questions about the areas of the program that they're responsible for during busy cycles, and then during less busy cycles, scale down the footprint of the operation.

I do everything I can to avoid aggregation. I want my analysts to be looking at the data at the interaction-by-interaction level.

Gardner: Is there anything else about the HPE Vertica OnDemand platform that benefits your particular need for analysis? I'm thinking about the scale and the rows. You must have so many variables when it comes to a retail situation, a commercial situation, where you're trying to really understand that consumer?

Dyskant: I do everything I can to avoid aggregation. I want my analysts to be looking at the data at the interaction-by-interaction level. If it’s a website, I want them to be looking at clickstream data. If it's a retail organization, I want them to be looking at point-of-sale data. In order to do that, we build data sets that are very frequently in the billions of rows. They're also very frequently incredibly wide, because we don't just want to know every transaction with this dollar amount. We want to know things like what the variables were, and where that store was located.

Getting back to the idea that we want our queries to be dead-simple, that means that we very frequently append additional columns on to our transaction tables. We’re okay that the table is big, because in a columnar model, we can pick out just the columns that we want for that particular query.

Then, moving into some of the in-database machine-learning algorithms allows us to perform more higher-order computation within the database and have less data shipping.

Gardner: We're almost out of time, but I wanted to do some predictive analysis ourselves. Thinking about the next election cycle, midterms, only two years away, what might change between now and then? We hear so much about machine learning, bots, and advanced algorithms. How do you predict, Erek, the way that big data will come to bear on the next election cycle?

Behavioral targeting

Dyskant: I think that a big piece of the next election will be around moving even more away from demographic targeting, toward even more behavioral targeting. How is it that we reach every voter based on what they're telling us about them and what matters to them, how that matters to them? That will increasingly drive our models.

To do that involves probably another 10X scale in the data, because that type of data is generally at the clickstream level, generally at the interaction-by-interaction level, incorporating things like Twitter feeds, which adds an additional level of complexity and laying in computational necessity to the data.

Gardner: It almost sounds like you're shooting for sentiment analysis on an issue-by-issue basis, a very complex undertaking, but it could be very powerful.

Dyskant: I think that it's heading in that direction, yes.

You may also be interested in:

More Stories By Dana Gardner

At Interarbor Solutions, we create the analysis and in-depth podcasts on enterprise software and cloud trends that help fuel the social media revolution. As a veteran IT analyst, Dana Gardner moderates discussions and interviews get to the meat of the hottest technology topics. We define and forecast the business productivity effects of enterprise infrastructure, SOA and cloud advances. Our social media vehicles become conversational platforms, powerfully distributed via the BriefingsDirect Network of online media partners like ZDNet and IT-Director.com. As founder and principal analyst at Interarbor Solutions, Dana Gardner created BriefingsDirect to give online readers and listeners in-depth and direct access to the brightest thought leaders on IT. Our twice-monthly BriefingsDirect Analyst Insights Edition podcasts examine the latest IT news with a panel of analysts and guests. Our sponsored discussions provide a unique, deep-dive focus on specific industry problems and the latest solutions. This podcast equivalent of an analyst briefing session -- made available as a podcast/transcript/blog to any interested viewer and search engine seeker -- breaks the mold on closed knowledge. These informational podcasts jump-start conversational evangelism, drive traffic to lead generation campaigns, and produce strong SEO returns. Interarbor Solutions provides fresh and creative thinking on IT, SOA, cloud and social media strategies based on the power of thoughtful content, made freely and easily available to proactive seekers of insights and information. As a result, marketers and branding professionals can communicate inexpensively with self-qualifiying readers/listeners in discreet market segments. BriefingsDirect podcasts hosted by Dana Gardner: Full turnkey planning, moderatiing, producing, hosting, and distribution via blogs and IT media partners of essential IT knowledge and understanding.

@BigDataExpo Stories
SYS-CON Events announced today that DXWorldExpo has been named “Global Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Digital Transformation is the key issue driving the global enterprise IT business. Digital Transformation is most prominent among Global 2000 enterprises and government institutions.
SYS-CON Events announced today that Datera, that offers a radically new data management architecture, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera is transforming the traditional datacenter model through modern cloud simplicity. The technology industry is at another major inflection point. The rise of mobile, the Internet of Things, data storage and Big...
SYS-CON Events announced today that Calligo, an innovative cloud service provider offering mid-sized companies the highest levels of data privacy and security, has been named "Bronze Sponsor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Calligo offers unparalleled application performance guarantees, commercial flexibility and a personalised support service from its globally located cloud plat...
"We focus on SAP workloads because they are among the most powerful but somewhat challenging workloads out there to take into public cloud," explained Swen Conrad, CEO of Ocean9, Inc., in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"Outscale was founded in 2010, is based in France, is a strategic partner to Dassault Systémes and has done quite a bit of work with divisions of Dassault," explained Jackie Funk, Digital Marketing exec at Outscale, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We are still a relatively small software house and we are focusing on certain industries like FinTech, med tech, energy and utilities. We help our customers with their digital transformation," noted Piotr Stawinski, Founder and CEO of EARP Integration, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We've been engaging with a lot of customers including Panasonic, we've been involved with Cisco and now we're working with the U.S. government - the Department of Homeland Security," explained Peter Jung, Chief Product Officer at Pulzze Systems, in this SYS-CON.tv interview at @ThingsExpo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"With Digital Experience Monitoring what used to be a simple visit to a web page has exploded into app on phones, data from social media feeds, competitive benchmarking - these are all components that are only available because of some type of digital asset," explained Leo Vasiliou, Director of Web Performance Engineering at Catchpoint Systems, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We want to show that our solution is far less expensive with a much better total cost of ownership so we announced several key features. One is called geo-distributed erasure coding, another is support for KVM and we introduced a new capability called Multi-Part," explained Tim Desai, Senior Product Marketing Manager at Hitachi Data Systems, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We provide IoT solutions. We provide the most compatible solutions for many applications. Our solutions are industry agnostic and also protocol agnostic," explained Richard Han, Head of Sales and Marketing and Engineering at Systena America, in this SYS-CON.tv interview at @ThingsExpo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"Peak 10 is a hybrid infrastructure provider across the nation. We are in the thick of things when it comes to hybrid IT," explained Michael Fuhrman, Chief Technology Officer at Peak 10, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We were founded in 2003 and the way we were founded was about good backup and good disaster recovery for our clients, and for the last 20 years we've been pretty consistent with that," noted Marc Malafronte, Territory Manager at StorageCraft, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
"The Striim platform is a full end-to-end streaming integration and analytics platform that is middleware that covers a lot of different use cases," explained Steve Wilkes, Founder and CTO at Striim, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We are focused on SAP running in the clouds, to make this super easy because we believe in the tremendous value of those powerful worlds - SAP and the cloud," explained Frank Stienhans, CTO of Ocean9, Inc., in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
DX World EXPO, LLC., a Lighthouse Point, Florida-based startup trade show producer and the creator of "DXWorldEXPO® - Digital Transformation Conference & Expo" has announced its executive management team. The team is headed by Levent Selamoglu, who has been named CEO. "Now is the time for a truly global DX event, to bring together the leading minds from the technology world in a conversation about Digital Transformation," he said in making the announcement.
21st International Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Me...
"MobiDev is a Ukraine-based software development company. We do mobile development, and we're specialists in that. But we do full stack software development for entrepreneurs, for emerging companies, and for enterprise ventures," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"Cloud computing is certainly changing how people consume storage, how they use it, and what they use it for. It's also making people rethink how they architect their environment," stated Brad Winett, Senior Technologist for DDN Storage, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
While the focus and objectives of IoT initiatives are many and diverse, they all share a few common attributes, and one of those is the network. Commonly, that network includes the Internet, over which there isn't any real control for performance and availability. Or is there? The current state of the art for Big Data analytics, as applied to network telemetry, offers new opportunities for improving and assuring operational integrity. In his session at @ThingsExpo, Jim Frey, Vice President of S...