Welcome!

@BigDataExpo Authors: William Schmarzo, Elizabeth White, Pat Romanski, Liz McMillan, Angsuman Dutta

Related Topics: @BigDataExpo, Microservices Expo, Containers Expo Blog, Apache, Cloud Security, SDN Journal

@BigDataExpo: Blog Post

Big Data – Big Help or Big Risk?

In order to effectively secure Big Data, you must mitigate the following security risks that aren’t addressed by prior security

By Andy Thurai (Twitter: @AndyThurai)

(Original version of this blog appeared on ProgrammableWeb)

As promised in my last blog “Big Data, API, and IoT …..Newer technologies protected by older security” here is a deep dive on Big Data security and how to effortlessly secure Big Data effectively.

Like many other open source models, Hadoop has followed a path that hasn’t focused much on security.  In order to effectively use Big Data, it needs to be secured properly. However if you try to force fit into an older security model, you might end up compromising more than you think. But if you make it highly secure, it might interfere with performance.

In order to effectively secure Big Data, you must mitigate the following security risks that aren’t addressed by prior security models.

Issue #1:  Are the keys to the kingdom with you?
In a hosted environment, the provider holds the keys to your secure data. If a government agency legally demands access, the providers are obligated to provide access to your data. While it is necessary, the onus should be on you to control when, what, and how much you are giving others access to and also keep track of the information released to facilitate internal auditing processes.

gove agency

Keep the keys to the kingdom with you. An encryption proxy can provide a tighter control.

Issue#2: Encrypting slows things down
If you encrypt the entire data, it could slow the performance down significantly. In order to avoid that, some of the Big Data, BI, and analytics programs choose to encrypt only portions of sensitive data. It is imperative to use a Big Data eco-system that is intelligent enough to encrypt data selectively.

A separate and more desirable option is to run faster encryption/ decryption. Solutions such as Intel Hadoop security Gateway use Intel chip based encryption acceleration (Intel AES-NI instruction set as well as SSE 4.2 instruction set) which is several orders of magnitude faster than software based encryption solutions. It is not only faster, but it is also more secure as the data never leaves the processor for an on or off-board crypto processor.

AES-NI encryption

Issue #3: Identifiable, sensitive data is a big risk
Sensitive data can be classified into two groups: Risk or Compliance. Safeguarding your data might include one of the following:

  1. Completely redact this information so you can never get the original information back. While this is the most effective method, it would be difficult to get the original data back if needed.
  2. Tokenize the sensitive data using proxy tokenization solution. You can create a completely random token that can be made to look like the original data to fit the format so it won’t break the backend systems. The sensitive data can be stored in a secure vault and only associated tokens can be distributed.
  3. Encrypt the sensitive data using mechanisms such as Format Preserving Encryption (FPE) so the output encrypted data fits the format of the original data. Care should be exercised in selecting a solution to make sure the solution has strong key management & strong encryption capabilities.

Issue #4: Data and access control properties together
If you let applications/services access the raw data that could be disastrous. Instead, you might want to enforce the data access controls, as close to the data as possible. You need to distribute data, associated properties, classification levels, and enforce them where the data is. One way to enforce this would be to have an API expose data that can control the exposure based on data attributes locally.

Issue #5: Protect the exposure APIs
Many of the Big Data components communicate via APIs (i.e. HDFS, HBase, and HCatalog). When you allow such powerful APIs to be exposed with very little, or no protection, it could lead to disastrous results. The most effective way to protect your Big Data goldmine would be to introduce a touchless API security Gateway in front of the Hadoop clusters. The clusters can be made to trust calls ONLY from the secure gateway. By choosing a hardened Big Data security gateway you can enforce all of the above by using very rich authentication and authorization schemes.

Issue #6: Name node protection
This issue is important enough for me to call this out as a separate issue. This arises from the architectural perspective that, if no proper resource protection is enforced, the NameNode can become the single point of failure making the entire Hadoop cluster useless. It is as easy as someone launching a DOS attack against webHDFS by producing excessive activity that can bring webHDFS down.

NameNode

Issue #7: Identify, Authenticate, Authorize and control the data access
You need to have an effective Identity Management and Access control system in place to make this happen. You also need to identify the user base and effectively control access to the data consistently based on access control policies without relying on an additional identity silos. Ideally, authentication and authorization for Hadoop should leverage existing identity management investments. The enforcement should also take into account the time based restrictions as well (such as certain users can access certain data only during specific periods, etc.).

Issue #8: Monitor, Log and analyze the usage patterns
Once you have implemented an effective data access controls based classification, you also need to monitor and log the usage patterns. You need to constantly analyze the usage patterns to make sure that there is no unusual activity. It is very crucial to catch an unusual activity and access-pattern early enough so you can avoid dumps of data making it out of your repository to a hacker.

Conclusion
As more and more organizations are rushing to implement and utilize the power of Big Data, care should be exercised to secure Big Data. Extending the existing security models to fit Big Data may not solve the problem; as a matter of fact it might introduce additional performance issues as discussed above. A solid security framework needs to be thought out before organizations can adopt enterprise grade Big Data.

The post Big Data – Big Help or Big Risk? appeared first on Application Security.

Read the original blog entry...

More Stories By Andy Thurai

Andy Thurai is Program Director for API, IoT and Connected Cloud with IBM, where he is responsible for solutionizing, strategizing, evangelizing, and providing thought leadership for those technologies. Prior to this role, he has held technology, architecture leadership and executive positions with Intel, Nortel, BMC, CSC, and L-1 Identity Solutions. You can find more of his thoughts at www.thurai.net/blog or follow him on Twitter @AndyThurai.

@BigDataExpo Stories
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
SYS-CON Events announced today that Grape Up will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company specializing in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market across the U.S. and Europe, Grape Up works with a variety of customers from emergi...
SYS-CON Events announced today that Massive Networks will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Massive Networks mission is simple. To help your business operate seamlessly with fast, reliable, and secure internet and network solutions. Improve your customer's experience with outstanding connections to your cloud.
SYS-CON Events announced today that SkyScale will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. SkyScale is a world-class provider of cloud-based, ultra-fast multi-GPU hardware platforms for lease to customers desiring the fastest performance available as a service anywhere in the world. SkyScale builds, configures, and manages dedicated systems strategically located in maximum-security...
Detecting internal user threats in the Big Data eco-system is challenging and cumbersome. Many organizations monitor internal usage of the Big Data eco-system using a set of alerts. This is not a scalable process given the increase in the number of alerts with the accelerating growth in data volume and user base. Organizations are increasingly leveraging machine learning to monitor only those data elements that are sensitive and critical, autonomously establish monitoring policies, and to detect...
Everything run by electricity will eventually be connected to the Internet. Get ahead of the Internet of Things revolution and join Akvelon expert and IoT industry leader, Sergey Grebnov, in his session at @ThingsExpo, for an educational dive into the world of managing your home, workplace and all the devices they contain with the power of machine-based AI and intelligent Bot services for a completely streamlined experience.
Because IoT devices are deployed in mission-critical environments more than ever before, it’s increasingly imperative they be truly smart. IoT sensors simply stockpiling data isn’t useful. IoT must be artificially and naturally intelligent in order to provide more value In his session at @ThingsExpo, John Crupi, Vice President and Engineering System Architect at Greenwave Systems, will discuss how IoT artificial intelligence (AI) can be carried out via edge analytics and machine learning techn...
With tough new regulations coming to Europe on data privacy in May 2018, Calligo will explain why in reality the effect is global and transforms how you consider critical data. EU GDPR fundamentally rewrites the rules for cloud, Big Data and IoT. In his session at 21st Cloud Expo, Adam Ryan, Vice President and General Manager EMEA at Calligo, will examine the regulations and provide insight on how it affects technology, challenges the established rules and will usher in new levels of diligence a...
Existing Big Data solutions are mainly focused on the discovery and analysis of data. The solutions are scalable and highly available but tedious when swapping in and swapping out occurs in disarray and thrashing takes place. The resolution for thrashing through machine learning algorithms and support nomenclature is through simple techniques. Organizations that have been collecting large customer data are increasingly seeing the need to use the data for swapping in and out and thrashing occurs ...
In the enterprise today, connected IoT devices are everywhere – both inside and outside corporate environments. The need to identify, manage, control and secure a quickly growing web of connections and outside devices is making the already challenging task of security even more important, and onerous. In his session at @ThingsExpo, Rich Boyer, CISO and Chief Architect for Security at NTT i3, discussed new ways of thinking and the approaches needed to address the emerging challenges of security i...
Cloud adoption is often driven by a desire to increase efficiency, boost agility and save money. All too often, however, the reality involves unpredictable cost spikes and lack of oversight due to resource limitations. In his session at 20th Cloud Expo, Joe Kinsella, CTO and Founder of CloudHealth Technologies, tackled the question: “How do you build a fully optimized cloud?” He will examine: Why TCO is critical to achieving cloud success – and why attendees should be thinking holistically ab...
SYS-CON Events announced today that Datera, that offers a radically new data management architecture, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera is transforming the traditional datacenter model through modern cloud simplicity. The technology industry is at another major inflection point. The rise of mobile, the Internet of Things, data storage and Big...
Blockchain is a shared, secure record of exchange that establishes trust, accountability and transparency across business networks. Supported by the Linux Foundation's open source, open-standards based Hyperledger Project, Blockchain has the potential to improve regulatory compliance, reduce cost as well as advance trade. Are you curious about how Blockchain is built for business? In her session at 21st Cloud Expo, René Bostic, Technical VP of the IBM Cloud Unit in North America, will discuss th...
An increasing number of companies are creating products that combine data with analytical capabilities. Running interactive queries on Big Data requires complex architectures to store and query data effectively, typically involving data streams, an choosing efficient file format/database and multiple independent systems that are tied together through custom-engineered pipelines. In his session at @BigDataExpo at @ThingsExpo, Tomer Levi, a senior software engineer at Intel’s Advanced Analytics ...
SYS-CON Events announced today that Datera will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera offers a radically new approach to data management, where innovative software makes data infrastructure invisible, elastic and able to perform at the highest level. It eliminates hardware lock-in and gives IT organizations the choice to source x86 server nodes, with business model option...
"Cloud computing is certainly changing how people consume storage, how they use it, and what they use it for. It's also making people rethink how they architect their environment," stated Brad Winett, Senior Technologist for DDN Storage, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, Cloud Expo and @ThingsExpo are two of the most important technology events of the year. Since its launch over eight years ago, Cloud Expo and @ThingsExpo have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, I provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading the...
SYS-CON Events announced today that CA Technologies has been named "Platinum Sponsor" of SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business - from apparel to energy - is being rewritten by software. From planning to development to management to security, CA creates software that fuels transformation for companies in the applic...
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...