@BigDataExpo Authors: Liz McMillan, Elizabeth White, Yeshim Deniz, PagerDuty Blog, Pat Romanski

Related Topics: @BigDataExpo

@BigDataExpo: Article

Data Loss Prevention | @BigDataExpo #BigData #Security #Analytics

Swift and massive data classification advances score a win for better securing sensitive information

The next BriefingsDirect Voice of the Customer digital transformation case study explores how -- in an era when cybersecurity attacks are on the rise and enterprises and governments are increasingly vulnerable -- new data intelligence capabilities are being brought to the edge to provide better data loss prevention (DLP).

We'll learn how Digital Guardian in Waltham, Massachusetts analyzes both structured and unstructured data to predict and prevent loss of data and intellectual property (IP) with increased accuracy.

To learn how data recognition technology supports network and endpoint forensic insights for enhanced security and control, we're joined by Marcus Brown, Vice President of Corporate Business Development for Digital Guardian. The discussion is moderated by BriefingsDirect's Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: What are some of the major trends making DLP even more important, and even more effective?

Brown: Data protection has very much to come to the forefront in the last couple of years. Unfortunately, we wake up every morning and read in the newspapers, see on television, and hear on the radio a lot about data breaches. It’s pretty much every type of company, every type of organization, government organizations, etc., that’s being hit by this phenomenon at the moment.


So, awareness is very high, and apart from the frequency, a couple of key points are changing. First of all, you have a lot of very skilled adversaries coming into this, criminals, nation-state actors, hactivists, and many others. All these people are well-trained and very well resourced to come after your data. That means that companies have a pretty big challenge in front of them. The threat has never been bigger.

In terms of data protection, there are a couple of key trends at the cyber-security level. People have been aware of the so-called insider threat for a long time. This could be a disgruntled employee or it could be someone who has been recruited for monetary gain to help some organization get to your data. That’s a difficult one, because the insider has all the privilege and the visibility and knows where the data is. So, that’s not a good thing.

Then, you have employees, well-meaning employees, who just make mistakes. It happens to all of us. We touch something in Outlook, and we have a different email address than the one we were intending, and it goes out. The well-meaning employees, as well, are part of the insider threat.

Outside threats

What’s really escalated over the last couple of years are the advanced external attackers or the outside threat, as we call it. These are well-resourced, well-trained people from nation-states or criminal organizations trying to break in from the outside. They do that with malware or phishing campaigns.

About 70 percent of the attacks stop with the phishing campaign, when someone clicks on something that looked normal. Then, there's just general hacking, a lot of people getting in without malware at all. They just hack straight in using different techniques that don’t rely on malware.

People have become so good at developing malware and targeting malware at particular organizations, at particular types of data, that a lot of tools like antivirus and intrusion prevention just don’t work very well. The success rate is very low. So, there are new technologies that are better at detecting stuff at the perimeter and on the endpoint, but it’s a tough time.

There are internal and external attackers. A lot of people outside are ultimately after the two main types of data that companies have. One is a customer data, which is credit card numbers, healthcare information, and all that stuff. All of this can be sold on the black market per record for so-and-so many dollars. It’s a billion-dollar business. People are very motivated to do this.

Most companies don’t want to lose their customers’ data. That’s seen as a pretty bad thing, a bad breach of trust, and people don’t like that. Then, obviously, for any company that has a product where you have IP, you spent lots of money developing that, whether it’s the new model of a car or some piece of electronics. It could be a movie, some new clothing, or whatever. It’s something that you have developed and it’s a secret IP. You don’t want that to get out, as well as all of your other internal information, whether it’s your financials, your plans, or your pricing. There are a lot of people going after both of those things, and that’s really the challenge.

In general, the world has become more mobile and spread out. There is no more perimeter to stop people from getting in. Everyone is everywhere, private life and work life is mixed, and you can access anything from anywhere. It’s a pretty big challenge.

Gardner: Even though there are so many different types of threats, internal, external, and so forth, one of the common things that we can do nowadays is get data to learn more about what we have as part of our inventory of important assets.

While we might not be able to seal off that perimeter, maybe we can limit the damage that takes place by early detection of problems. The earlier that an organization can detect that something is going on that shouldn’t be, the quicker they can come to the rescue. How does the instant analysis of data play a role in limiting negative outcomes?

Can't protect everything

Brown: If you want to protect something, you have to know it’s sensitive and that you want to protect it. You can’t protect everything. You're going to find which data is sensitive, and we're able to do that on-the-fly to recognize sensitive data and nonsensitive data. That’s a key part of the DLP puzzle, the data protection puzzle.

We work for some pretty large organizations, some of the largest companies and government organizations in the world, as well as lot of medium- and smaller-sized customers. Whatever it is we're trying to protect, personal information or indeed the IP, we need to be in the right place to see what people are doing with that data.

Our solution consists of two main types of agents. Some agents are on endpoint computers, which could be desktops or servers, Windows, Linux, and Macintosh. It’s a good place to be on the endpoint computer, because that’s where people, particularly the insider, come into play and start doing something with data. That’s where people work. That’s how they come into the network and it’s how they handle a business process.

So the challenge in DLP is to support the business process. Let people do with data what they need to do, but don’t let that data get out. The way to do that is to be in the right place. I already mentioned the endpoint agent, but we also have network agents, sensors, and appliances in the network that can look at data moving around.

The endpoint is really in the middle of the business process. Someone is working, they're working with different applications, getting data out of those applications, and they're doing whatever they need to do in their daily work. That’s where we sit, right in the middle of that, and we can see who the user is and what application they're working with it. It could be an engineer working with the computer-aided design (CAD) or the product lifecycle management (PLM) system developing some new automobile or whatever, and that’s a great place to be.

We rely very heavily on the HPE IDOL technology for helping us classify data. We use it particularly for structured data, anything like a credit card number, or alphanumeric data. It could be also free text about healthcare, patient information, and all this sort of stuff.

We use IDOL to help us scan documents. We can recognize regular expressions, that’s a credit card number type of thing, or Social Security. We can also recognize terminology. We rely on the fact that IDOL supports hundreds of languages and many different subject areas. So, using IDOL, we're able to recognize a whole lot of anything that’s written in textual language.

Our endpoint agent also has some of its own intelligence built in that we put on top of what we call contextual recognition or contextual classification. As I said, we see the customer list coming out of Salesforce.com or we see the jet fighter design coming out of the PLM system and we then tag that as well. We're using IDOL, we're using some of our technology, and we're using our vantage point on the endpoint being in the business process to figure out what the data is.

We call that data-in-use monitoring and, once we see something is sensitive, we put a tag on it, and that tag travels with the data no matter where it goes.

An interesting thing is that if you have someone making a mistake, an unintentional, good-willed employee, accidentally attaching the wrong doc to something that it goes out, obviously it will warn the user of that.

We can stop that

If you have someone who is very, very malicious and is trying to obfuscate what they're doing, we can see that as well. For example, taking a screenshot of some top-secret diagram, embedding that in a PowerPoint and then encrypting the PowerPoint, we're tagging those docs. Anything that results from IP or top-secret information, we keep tagging that. When the guy then goes to put it on a thumb drive, put it on Dropbox, or whatever, we see that and stop that.

So that’s still a part of the problem, but the two points are classify it, that’s what we rely on IDOL a lot for, and then stop it from going out, that’s what our agent is responsible for.

Gardner: Let’s talk a little bit about the results here, when behaviors, people and the organization are brought to bear together with technology, because it’s people, process and technology. When it becomes known in the organization that you can do this, I should think that that must be a fairly important step. How do we measure effectiveness when you start using a technology like Digital Guardian? Where does that become explained and known in the organization and what impact does that have?

Brown: Our whole approach is a risk-based approach and it’s based on visibility. You’ve got to be able to see the problem and then you can take steps and exercise control to stop the problems.

When you deploy our solution, you immediately gain a lot of visibility. I mentioned the endpoints and I mentioned the network. Basically, you get a snapshot without deploying any rules or configuring in any complex way. You just turn this on and you suddenly get this rich visibility, which is manifested in reports, trends, and all this stuff. What you get, after a very short period of time, is a set of reports that tell you what your risks are, and some of those risks may be that your HR information is being put on Dropbox.

You have engineers putting the source code onto thumb drives. It could all be well-meaning, they want to work on it at home or whatever, or it could be some bad guy.

One the biggest points of risk in any company is when an employee resigns and decides to move on. A lot of our customers use the monitoring and the reporting we have at that time to actually sit down with the employee and say, "We noticed that you downloaded 2,000 files and put them on a thumb drive. We’d like you to sign this saying that you're going to give us that data back."

That’s a typical use case, and that’s the visibility you get. You turn it on and you suddenly see all these risks, hopefully, not too many, but a certain number of risks and then you decide what you're going to do about it. In some areas you might want to be very draconian and say, "I'm not going to allow this. I'm going to completely block this. There is no reason why you should put the jet fighter design up on Dropbox."

Gardner: That’s where the epoxy in the USB drives comes in.

Warning people

Brown: Pretty much. On the other hand, you don’t want to stop people using USB, because it’s about their productivity, etc. So, you might want to warn people, if you're putting some financial data on to a thumb drive, we're going to encrypt that so nothing can happen to it, but do you really want to do this? Is this approach appropriate? People get a feeling that they're being monitored and that the way they are acting maybe isn't according to company policy. So, they'll back out of it.

In a nutshell, you look at the status quo, you put some controls in place, and after those controls are in place, within the space of a week, you suddenly see the risk posture changing, getting better, and the incidence of these dangerous actions dropping dramatically.

Very quickly, you can measure the security return on investment (ROI) in terms of people’s behavior and what’s happening. Our customers use that a lot internally to justify what they're doing.

Generally, you can get rid of a very large amount of the risk, say 90 percent, with an initial pass, or initial first two passes of rules to say, we don’t want this, we don’t want that. Then, you're monitoring the status, and suddenly, new things will happen. People discover new ways of doing things, and then you’ve got to put some controls in place, but you're pretty quickly up into the 90 percent and then you fine-tuning to get those last little bits of risk out.

Gardner: Because organizations are becoming increasingly data-driven, they're getting information and insight across their systems and their applications. Now, you're providing them with another data set that they could use. Is there some way that organizations are beginning to assimilate and analyze multiple data sets including what Digital Guardian’s agents are providing them in order to have even better analytics on what’s going on or how to prevent unpleasant activities?

Brown: In this security world, you have the security operations center (SOC), which is kind of the nerve center where everything to do with security comes into play. The main piece of technology in that area is the security information and event management (SIEM) technology. The market leader is HPE’s ArcSight, and that’s really where all of the many tools that security organizations use come together in one console, where all of that information can be looked at in a central place and can also be correlated.

We provide a lot of really interesting information for the SIEM for the SOC. I already mentioned we're on the endpoint and the network, particularly on the endpoint. That’s a bit of a blind spot for a lot of security organizations. They're traditionally looking at firewalls, other network devices, and this kind of stuff.

We provide rich information about the user, about the data, what’s going on with the data, and what’s going on with the system on the endpoint. That’s key for detecting malware, etc. We have all this rich visibility on the endpoint and also from the network. We actually pre-correlate that. We have our own correlation rules. On the endpoint computer in real time, we're correlating stuff. All of that gets populated into ArcSight.

At the recent HPE Protect Show in National Harbor in September we showed the latest generation of our integration, which we're very excited about. We have a lot of ArcSight content, which helps people in the SOC leverage our data, and we gave a couple of presentations at the show on that.

Gardner: And is there a way to make this even more protected? I believe encryption could be brought to bear and it plays a role in how the SIEM can react and behave.

Seamless experience

Brown: We actually have a new partnership, related to HPE's acquisition of Voltage, which is a real leader in the e-mail security space. It’s all about applying encryption to messages and managing the keys and making that user experience very seamless and easy to use.

Adding to that, we're bundling up some of the classification functionality that we have in our network sensors. What we have is a combination between Digital Guardian Network, DOP, and the HPE Data Security Encryption solution, where an enterprise can define a whole bunch of rules based on templates.

We can say, "I need to comply with HIPAA," "I need to comply with PCI," or whatever standard it is. Digital Guardian on the network will automatically scan all the e-mail going out and automatically classify according to our rules which e-mails are sensitive and which attachments are sensitive. It then goes on to the HPE Data Security Solution where it gets encrypted automatically and then sent out.

It’s basically allowing corporations to apply standard set of policies, not relying on the user to say they need to encrypt this, not leaving it to the user’s judgment, but actually applying standard policies across the enterprise for all e-mail making sure they get encrypted. We are very excited about it.

Gardner: That sounds key -- using encryption to the best of its potential, being smart about it, not just across the waterfront, and then not depending on a voluntary encryption, but doing it based on need and intelligence.

Brown: Exactly.

Gardner: For those organizations that are increasingly trying to be data-driven, intelligent, taking advantage of the technologies and doing analysis in new interesting ways, what advice might you offer in the realm of security? Clearly, we’ve heard at various conferences and other places that security is, in a sense, the killer application of big-data analytics. If you're an organization seeking to be more data-driven, how can you best use that to improve your security posture?

Brown: The key, as far as we’re concerned, is that you have to watch your data, you have to understand your data, you need to collect information, and you need visibility of your data.

The other key point is that the security market has been shifting pretty dramatically from more of a network view much more toward the endpoint. I mentioned earlier that antivirus and some of these standard technologies on the endpoint aren't really cutting it anymore. So, it’s very important that you get visibility down at the endpoint and you need to see what users are doing, you need to understand what your systems are running, and you need to understand where your data is.

So collect that, get that visibility, and then leverage that visibility with analytics and tools so that you can profit from an automated kind of intelligence.

You may also be interested in:

More Stories By Dana Gardner

At Interarbor Solutions, we create the analysis and in-depth podcasts on enterprise software and cloud trends that help fuel the social media revolution. As a veteran IT analyst, Dana Gardner moderates discussions and interviews get to the meat of the hottest technology topics. We define and forecast the business productivity effects of enterprise infrastructure, SOA and cloud advances. Our social media vehicles become conversational platforms, powerfully distributed via the BriefingsDirect Network of online media partners like ZDNet and IT-Director.com. As founder and principal analyst at Interarbor Solutions, Dana Gardner created BriefingsDirect to give online readers and listeners in-depth and direct access to the brightest thought leaders on IT. Our twice-monthly BriefingsDirect Analyst Insights Edition podcasts examine the latest IT news with a panel of analysts and guests. Our sponsored discussions provide a unique, deep-dive focus on specific industry problems and the latest solutions. This podcast equivalent of an analyst briefing session -- made available as a podcast/transcript/blog to any interested viewer and search engine seeker -- breaks the mold on closed knowledge. These informational podcasts jump-start conversational evangelism, drive traffic to lead generation campaigns, and produce strong SEO returns. Interarbor Solutions provides fresh and creative thinking on IT, SOA, cloud and social media strategies based on the power of thoughtful content, made freely and easily available to proactive seekers of insights and information. As a result, marketers and branding professionals can communicate inexpensively with self-qualifiying readers/listeners in discreet market segments. BriefingsDirect podcasts hosted by Dana Gardner: Full turnkey planning, moderatiing, producing, hosting, and distribution via blogs and IT media partners of essential IT knowledge and understanding.

@BigDataExpo Stories
The essence of cloud computing is that all consumable IT resources are delivered as services. In his session at 15th Cloud Expo, Yung Chou, Technology Evangelist at Microsoft, demonstrated the concepts and implementations of two important cloud computing deliveries: Infrastructure as a Service (IaaS) and Platform as a Service (PaaS). He discussed from business and technical viewpoints what exactly they are, why we care, how they are different and in what ways, and the strategies for IT to transi...
The Internet of Things is clearly many things: data collection and analytics, wearables, Smart Grids and Smart Cities, the Industrial Internet, and more. Cool platforms like Arduino, Raspberry Pi, Intel's Galileo and Edison, and a diverse world of sensors are making the IoT a great toy box for developers in all these areas. In this Power Panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists discussed what things are the most important, which will have the most profound e...
Keeping pace with advancements in software delivery processes and tooling is taxing even for the most proficient organizations. Point tools, platforms, open source and the increasing adoption of private and public cloud services requires strong engineering rigor - all in the face of developer demands to use the tools of choice. As Agile has settled in as a mainstream practice, now DevOps has emerged as the next wave to improve software delivery speed and output. To make DevOps work, organization...
My team embarked on building a data lake for our sales and marketing data to better understand customer journeys. This required building a hybrid data pipeline to connect our cloud CRM with the new Hadoop Data Lake. One challenge is that IT was not in a position to provide support until we proved value and marketing did not have the experience, so we embarked on the journey ourselves within the product marketing team for our line of business within Progress. In his session at @BigDataExpo, Sum...
Extreme Computing is the ability to leverage highly performant infrastructure and software to accelerate Big Data, machine learning, HPC, and Enterprise applications. High IOPS Storage, low-latency networks, in-memory databases, GPUs and other parallel accelerators are being used to achieve faster results and help businesses make better decisions. In his session at 18th Cloud Expo, Michael O'Neill, Strategic Business Development at NVIDIA, focused on some of the unique ways extreme computing is...
Information technology (IT) advances are transforming the way we innovate in business, thereby disrupting the old guard and their predictable status-quo. It’s creating global market turbulence. Industries are converging, and new opportunities and threats are emerging, like never before. So, how are savvy chief information officers (CIOs) leading this transition? Back in 2015, the IBM Institute for Business Value conducted a market study that included the findings from over 1,800 CIO interviews ...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
"We host and fully manage cloud data services, whether we store, the data, move the data, or run analytics on the data," stated Kamal Shannak, Senior Development Manager, Cloud Data Services, IBM, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
In his General Session at 17th Cloud Expo, Bruce Swann, Senior Product Marketing Manager for Adobe Campaign, explored the key ingredients of cross-channel marketing in a digital world. Learn how the Adobe Marketing Cloud can help marketers embrace opportunities for personalized, relevant and real-time customer engagement across offline (direct mail, point of sale, call center) and digital (email, website, SMS, mobile apps, social networks, connected objects).
SYS-CON Events announced today that Ocean9will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Ocean9 provides cloud services for Backup, Disaster Recovery (DRaaS) and instant Innovation, and redefines enterprise infrastructure with its cloud native subscription offerings for mission critical SAP workloads.
SYS-CON Events announced today that Auditwerx will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Auditwerx specializes in SOC 1, SOC 2, and SOC 3 attestation services throughout the U.S. and Canada. As a division of Carr, Riggs & Ingram (CRI), one of the top 20 largest CPA firms nationally, you can expect the resources, skills, and experience of a much larger firm combined with the accessibility and atten...
In his session at @ThingsExpo, Eric Lachapelle, CEO of the Professional Evaluation and Certification Board (PECB), will provide an overview of various initiatives to certifiy the security of connected devices and future trends in ensuring public trust of IoT. Eric Lachapelle is the Chief Executive Officer of the Professional Evaluation and Certification Board (PECB), an international certification body. His role is to help companies and individuals to achieve professional, accredited and worldw...
MongoDB Atlas leverages VPC peering for AWS, a service that allows multiple VPC networks to interact. This includes VPCs that belong to other AWS account holders. By performing cross account VPC peering, users ensure networks that host and communicate their data are secure. In his session at 20th Cloud Expo, Jay Gordon, a Developer Advocate at MongoDB, will explain how to properly architect your VPC using existing AWS tools and then peer with your MongoDB Atlas cluster. He'll discuss the secur...
Deep learning has been very successful in social sciences and specially areas where there is a lot of data. Trading is another field that can be viewed as social science with a lot of data. With the advent of Deep Learning and Big Data technologies for efficient computation, we are finally able to use the same methods in investment management as we would in face recognition or in making chat-bots. In his session at 20th Cloud Expo, Gaurav Chakravorty, co-founder and Head of Strategy Development ...
In his session at Cloud Expo, Alan Winters, an entertainment executive/TV producer turned serial entrepreneur, will present a success story of an entrepreneur who has both suffered through and benefited from offshore development across multiple businesses: The smart choice, or how to select the right offshore development partner Warning signs, or how to minimize chances of making the wrong choice Collaboration, or how to establish the most effective work processes Budget control, or how to m...
SYS-CON Events announced today that Linux Academy, the foremost online Linux and cloud training platform and community, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Linux Academy was founded on the belief that providing high-quality, in-depth training should be available at an affordable price. Industry leaders in quality training, provided services, and student certification passes, its goal is to c...
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From ...
SYS-CON Events announced today that Technologic Systems Inc., an embedded systems solutions company, will exhibit at SYS-CON's @ThingsExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Technologic Systems is an embedded systems company with headquarters in Fountain Hills, Arizona. They have been in business for 32 years, helping more than 8,000 OEM customers and building over a hundred COTS products that have never been discontinued. Technologic Systems’ pr...
In his keynote at @ThingsExpo, Chris Matthieu, Director of IoT Engineering at Citrix and co-founder and CTO of Octoblu, focused on building an IoT platform and company. He provided a behind-the-scenes look at Octoblu’s platform, business, and pivots along the way (including the Citrix acquisition of Octoblu).