Welcome!

@BigDataExpo Authors: Elizabeth White, Pat Romanski, Ferhat Hatay, William Schmarzo, Anders Wallgren

Related Topics: @BigDataExpo, Java IoT, Microservices Expo, @CloudExpo, Apache, Cloud Security

@BigDataExpo: Blog Feed Post

How to Secure Hadoop Without Touching It

Combining API Security and Hadoop

It sounds like a parlor trick, but one of the benefits of API centric de-facto standards  such as REST and JSON is they allow relatively seamless communication between software systems.

This makes it possible to combine technologies to instantly bring out new capabilities. In particular I want to talk about how an API Gateway can improve the security posture of a Hadoop installation without having to actually modify Hadoop itself. Sounds too good to be true? Read on.

Hadoop and RESTful APIs
Hadoop is mostly a behind the firewall affair, and APIs are generally used for exposing data or capabilities for other systems, users or mobile devices. In the case of Hadoop there are three main RESTful APIs to talk about. This list isn’t exhaustive but it covers the main APIs.

  1. WebHDFS – Offers complete control over files and directories in HDFS
  2. HBase REST API – Offers access to insert, create, delete, single/multiple cell values
  3. HCatalog REST API – Provides job control for Map/Reduce, Pig and Hive as well as to access and manipulate HCatalog DDL data

These APIs are very useful because anyone with an HTTP client can potentially manipulate data in Hadoop. This, of course, is like using a knife all-blade – it’s very easy to cut yourself. To take an example, WebHDFS allows RESTful calls for directory listings, creating new directories and files, as well as file deletion. Worse,  the default security model requires nothing more than inserting “root” into the HTTP call.

To its credit, most distributions of Hadoop also offer Kerberos SPNEGO authentication, but additional work is needed to support other types of authentication and authorization schemes, and not all REST calls that expose sensitive data (such as a list of files) are secured. Here are some of the other challenges:

  • Fragmented Enforcement – Some REST calls leak information and require no credentials
  • Developer Centric Interfaces – Full Java stack traces are passed back to callers, leaking system details
  • Resource Protection – The Namenode is a single point of failure and excessive WebHDFS activity may threaten the cluster
  • Consistent Security Policy – All APIs in Hadoop must be independently configured, managed and audited over time

This list is just a start, and to be fair, Hadoop is still evolving. We expect things to get better over time, but for Enterprises to unlock value from their “Big Data” projects now, they can’t afford to wait until security is perfect.

One model used in other domains is an API Gateway or proxy that sits between the Hadoop cluster and the client. Using this model, the cluster only trusts calls from the gateway and all potential API callers are forced to use the gateway. Further, the gateway capabilities are rich enough and expressive enough to perform the full depth and breadth of security for REST calls from authentication to message level security, tokenization, throttling, denial of service protection, attack protection and data translation. Even better, this provides a safe and effective way to expose Hadoop to mobile devices without worrying about performance, scalability and security.  Here is the conceptual picture:

Intel Expressway API Manager and Intel Distribution of Apache Hadoop

In the previous diagram we are showing the Intel(R) Expressway API Manager acting as a proxy for WebHDFS, HBase and HCatalog APIs exposed from Intel’s Hadoop distribution. API Manager exposes RESTful APIs and also provides an out of the box subscription to Mashery to help evangelize APIs among a community of developers.

All of the policy enforcement is done at the HTTP layer by the gateway and the security administrator is free to rewrite the API to be more user friendly to the caller and the gateway will take care of mapping and rewriting the REST call to the format supported by Hadoop. In short, this model lets you provide instant Enterprise security for a good chunk of Hadoop capabilities without having to add a plug-in, additional code or a special distribution of Hadoop. So… just what can you do without touching Hadoop? To take WebHDFS as an example the following is possible with some configuration on the gateway itself:

  1. A gateway can lock-down the standard WebHDFS REST API and allow access only for specific users based on an Enterprise identity that may be stored in LDAP, Active Directory, Oracle, Siteminder, IBM or Relational Databases.
  2. A gateway provides additional authentication methods such as X.509 certificates with CRL and OCSP checking, OAuth token handling, API keys support, WS-Security and SSL termination & acceleration for WebHDFS API calls. The gateway can expose secure versions of the WebDHFS API for external access
  3. A gateway can improve on the security model used by WebHDFS which carries identities in HTTP query parameters, which are more susceptible to credential leakage compared to a security model based on HTTP headers. The gateway can expose a variant of the WebHDFS API that expects credentials in the HTTP header and seamlessly maps this to the WebHDFS internal format
  4. The gateway workflow engine can maps a single function REST call into multiple WebHDFS calls. For example, the WebHDFS REST API requires two separate HTTP calls for file creation and file upload. The gateway can expose a single API for this that handles the sequential execution and error handling, exposing a single function to the user
  5. The gateway can strip and redact Java exception traces carried in the WebHDFS REST API responses ( for instance, JSON responses may carry org.apache.hadoop.security.AccessControlException.* which can spill details beneficial to an attacker
  6. The gateway can throttle and rate shape WebHDFS REST requests which can protect the Hadoop cluster from resource consumption from excessive HDFS writes, open file handles and excessive  create, read, update and delete operations which might impact a running job.

This list is just the start, API manager can also perform selective encryption and data protection (such as PCI tokenization or PII format preserving encryption) on data as it is inserted or deleted from the Hadoop cluster, all by sitting in-between the caller and the cluster. So the parlor trick here is really moving the problem from trying to secure hadoop from the inside out to moving and centralizing security to the enforcement point. If you are looking for a way to expose “Big Data” outside the cluster, an the API Gateway model may be worth some investigation!

Blake

 

The post How to secure Hadoop without touching it – combining API Security and Hadoop appeared first on Security Gateways@Intel.

Read the original blog entry...

More Stories By Application Security

This blog references our expert posts on application and web services security.

@BigDataExpo Stories
With the proliferation of both SQL and NoSQL databases, organizations can now target specific fit-for-purpose database tools for their different application needs regarding scalability, ease of use, ACID support, etc. Platform as a Service offerings make this even easier now, enabling developers to roll out their own database infrastructure in minutes with minimal management overhead. However, this same amount of flexibility also comes with the challenges of picking the right tool, on the right ...
SYS-CON Events announced today that Catchpoint Systems, Inc., a provider of innovative web and infrastructure monitoring solutions, has been named “Silver Sponsor” of SYS-CON's DevOps Summit at 18th Cloud Expo New York, which will take place June 7-9, 2016, at the Javits Center in New York City, NY. Catchpoint is a leading Digital Performance Analytics company that provides unparalleled insight into customer-critical services to help consistently deliver an amazing customer experience. Designed...
SYS-CON Events announced today that Alert Logic, Inc., the leading provider of Security-as-a-Service solutions for the cloud, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Alert Logic, Inc., provides Security-as-a-Service for on-premises, cloud, and hybrid infrastructures, delivering deep security insight and continuous protection for customers at a lower cost than traditional security solutions. Ful...
SYS-CON Events announced today that Interoute, owner-operator of one of Europe's largest networks and a global cloud services platform, has been named “Bronze Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2015 at the Javits Center in New York, New York. Interoute is the owner-operator of one of Europe's largest networks and a global cloud services platform which encompasses 12 data centers, 14 virtual data centers and 31 colocation centers, with connections to 195 ad...
Recognizing the need to identify and validate information security professionals’ competency in securing cloud services, the two leading membership organizations focused on cloud and information security, the Cloud Security Alliance (CSA) and (ISC)^2, joined together to develop an international cloud security credential that reflects the most current and comprehensive best practices for securing and optimizing cloud computing environments.
Companies can harness IoT and predictive analytics to sustain business continuity; predict and manage site performance during emergencies; minimize expensive reactive maintenance; and forecast equipment and maintenance budgets and expenditures. Providing cost-effective, uninterrupted service is challenging, particularly for organizations with geographically dispersed operations.
SYS-CON Events announced today that Avere Systems, a leading provider of enterprise storage for the hybrid cloud, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Avere delivers a more modern architectural approach to storage that doesn’t require the overprovisioning of storage capacity to achieve performance, overspending on expensive storage media for inactive data or the overbuilding of data centers ...
SYS-CON Events announced today that Commvault, a global leader in enterprise data protection and information management, has been named “Bronze Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY, and the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Commvault is a leading provider of data protection and information management...
SYS-CON Events announced today that VAI, a leading ERP software provider, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. VAI (Vormittag Associates, Inc.) is a leading independent mid-market ERP software developer renowned for its flexible solutions and ability to automate critical business functions for the distribution, manufacturing, specialty retail and service sectors. An IBM Premier Business Part...
In most cases, it is convenient to have some human interaction with a web (micro-)service, no matter how small it is. A traditional approach would be to create an HTTP interface, where user requests will be dispatched and HTML/CSS pages must be served. This approach is indeed very traditional for a web site, but not really convenient for a web service, which is not intended to be good looking, 24x7 up and running and UX-optimized. Instead, talking to a web service in a chat-bot mode would be muc...
It's easy to assume that your app will run on a fast and reliable network. The reality for your app's users, though, is often a slow, unreliable network with spotty coverage. What happens when the network doesn't work, or when the device is in airplane mode? You get unhappy, frustrated users. An offline-first app is an app that works, without error, when there is no network connection.
With an estimated 50 billion devices connected to the Internet by 2020, several industries will begin to expand their capabilities for retaining end point data at the edge to better utilize the range of data types and sheer volume of M2M data generated by the Internet of Things. In his session at @ThingsExpo, Don DeLoach, CEO and President of Infobright, will discuss the infrastructures businesses will need to implement to handle this explosion of data by providing specific use cases for filte...
Cognitive Computing is becoming the foundation for a new generation of solutions that have the potential to transform business. Unlike traditional approaches to building solutions, a cognitive computing approach allows the data to help determine the way applications are designed. This contrasts with conventional software development that begins with defining logic based on the current way a business operates. In her session at 18th Cloud Expo, Judith S. Hurwitz, President and CEO of Hurwitz & ...
Fortunately, meaningful and tangible business cases for IoT are plentiful in a broad array of industries and vertical markets. These range from simple warranty cost reduction for capital intensive assets, to minimizing downtime for vital business tools, to creating feedback loops improving product design, to improving and enhancing enterprise customer experiences. All of these business cases, which will be briefly explored in this session, hinge on cost effectively extracting relevant data from ...
SYS-CON Events announced today that Pythian, a global IT services company specializing in helping companies adopt disruptive technologies to optimize revenue-generating systems, has been named “Bronze Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2015 at the Javits Center in New York, New York. Founded in 1997, Pythian is a global IT services company that helps companies compete by adopting disruptive technologies such as cloud, Big Data, advanced analytics, and DevO...
With the Apple Watch making its way onto wrists all over the world, it’s only a matter of time before it becomes a staple in the workplace. In fact, Forrester reported that 68 percent of technology and business decision-makers characterize wearables as a top priority for 2015. Recognizing their business value early on, FinancialForce.com was the first to bring ERP to wearables, helping streamline communication across front and back office functions. In his session at @ThingsExpo, Kevin Roberts...
As enterprises work to take advantage of Big Data technologies, they frequently become distracted by product-level decisions. In most new Big Data builds this approach is completely counter-productive: it presupposes tools that may not be a fit for development teams, forces IT to take on the burden of evaluating and maintaining unfamiliar technology, and represents a major up-front expense. In his session at @BigDataExpo at @ThingsExpo, Andrew Warfield, CTO and Co-Founder of Coho Data, will dis...
SYS-CON Events announced today that iDevices®, the preeminent brand in the connected home industry, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. iDevices, the preeminent brand in the connected home industry, has a growing line of HomeKit-enabled products available at the largest retailers worldwide. Through the “Designed with iDevices” co-development program and its custom-built IoT Cloud Infrastruc...
Eighty percent of a data scientist’s time is spent gathering and cleaning up data, and 80% of all data is unstructured and almost never analyzed. Cognitive computing, in combination with Big Data, is changing the equation by creating data reservoirs and using natural language processing to enable analysis of unstructured data sources. This is impacting every aspect of the analytics profession from how data is mined (and by whom) to how it is delivered. This is not some futuristic vision: it's ha...
Silver Spring Networks, Inc. (NYSE: SSNI) extended its Internet of Things technology platform with performance enhancements to Gen5 – its fifth generation critical infrastructure networking platform. Already delivering nearly 23 million devices on five continents as one of the leading networking providers in the market, Silver Spring announced it is doubling the maximum speed of its Gen5 network to up to 2.4 Mbps, increasing computational performance by 10x, supporting simultaneous mesh communic...