Click here to close now.

Welcome!

Big Data Journal Authors: Cloud Best Practices Network, Carmen Gonzalez, Roger Strukhoff, Elizabeth White, Esmeralda Swartz

Related Topics: Big Data Journal, Java, MICROSERVICES, Cloud Expo, Apache, Security

Big Data Journal: Blog Feed Post

How to Secure Hadoop Without Touching It

Combining API Security and Hadoop

It sounds like a parlor trick, but one of the benefits of API centric de-facto standards  such as REST and JSON is they allow relatively seamless communication between software systems.

This makes it possible to combine technologies to instantly bring out new capabilities. In particular I want to talk about how an API Gateway can improve the security posture of a Hadoop installation without having to actually modify Hadoop itself. Sounds too good to be true? Read on.

Hadoop and RESTful APIs
Hadoop is mostly a behind the firewall affair, and APIs are generally used for exposing data or capabilities for other systems, users or mobile devices. In the case of Hadoop there are three main RESTful APIs to talk about. This list isn’t exhaustive but it covers the main APIs.

  1. WebHDFS – Offers complete control over files and directories in HDFS
  2. HBase REST API – Offers access to insert, create, delete, single/multiple cell values
  3. HCatalog REST API – Provides job control for Map/Reduce, Pig and Hive as well as to access and manipulate HCatalog DDL data

These APIs are very useful because anyone with an HTTP client can potentially manipulate data in Hadoop. This, of course, is like using a knife all-blade – it’s very easy to cut yourself. To take an example, WebHDFS allows RESTful calls for directory listings, creating new directories and files, as well as file deletion. Worse,  the default security model requires nothing more than inserting “root” into the HTTP call.

To its credit, most distributions of Hadoop also offer Kerberos SPNEGO authentication, but additional work is needed to support other types of authentication and authorization schemes, and not all REST calls that expose sensitive data (such as a list of files) are secured. Here are some of the other challenges:

  • Fragmented Enforcement – Some REST calls leak information and require no credentials
  • Developer Centric Interfaces – Full Java stack traces are passed back to callers, leaking system details
  • Resource Protection – The Namenode is a single point of failure and excessive WebHDFS activity may threaten the cluster
  • Consistent Security Policy – All APIs in Hadoop must be independently configured, managed and audited over time

This list is just a start, and to be fair, Hadoop is still evolving. We expect things to get better over time, but for Enterprises to unlock value from their “Big Data” projects now, they can’t afford to wait until security is perfect.

One model used in other domains is an API Gateway or proxy that sits between the Hadoop cluster and the client. Using this model, the cluster only trusts calls from the gateway and all potential API callers are forced to use the gateway. Further, the gateway capabilities are rich enough and expressive enough to perform the full depth and breadth of security for REST calls from authentication to message level security, tokenization, throttling, denial of service protection, attack protection and data translation. Even better, this provides a safe and effective way to expose Hadoop to mobile devices without worrying about performance, scalability and security.  Here is the conceptual picture:

Intel Expressway API Manager and Intel Distribution of Apache Hadoop

In the previous diagram we are showing the Intel(R) Expressway API Manager acting as a proxy for WebHDFS, HBase and HCatalog APIs exposed from Intel’s Hadoop distribution. API Manager exposes RESTful APIs and also provides an out of the box subscription to Mashery to help evangelize APIs among a community of developers.

All of the policy enforcement is done at the HTTP layer by the gateway and the security administrator is free to rewrite the API to be more user friendly to the caller and the gateway will take care of mapping and rewriting the REST call to the format supported by Hadoop. In short, this model lets you provide instant Enterprise security for a good chunk of Hadoop capabilities without having to add a plug-in, additional code or a special distribution of Hadoop. So… just what can you do without touching Hadoop? To take WebHDFS as an example the following is possible with some configuration on the gateway itself:

  1. A gateway can lock-down the standard WebHDFS REST API and allow access only for specific users based on an Enterprise identity that may be stored in LDAP, Active Directory, Oracle, Siteminder, IBM or Relational Databases.
  2. A gateway provides additional authentication methods such as X.509 certificates with CRL and OCSP checking, OAuth token handling, API keys support, WS-Security and SSL termination & acceleration for WebHDFS API calls. The gateway can expose secure versions of the WebDHFS API for external access
  3. A gateway can improve on the security model used by WebHDFS which carries identities in HTTP query parameters, which are more susceptible to credential leakage compared to a security model based on HTTP headers. The gateway can expose a variant of the WebHDFS API that expects credentials in the HTTP header and seamlessly maps this to the WebHDFS internal format
  4. The gateway workflow engine can maps a single function REST call into multiple WebHDFS calls. For example, the WebHDFS REST API requires two separate HTTP calls for file creation and file upload. The gateway can expose a single API for this that handles the sequential execution and error handling, exposing a single function to the user
  5. The gateway can strip and redact Java exception traces carried in the WebHDFS REST API responses ( for instance, JSON responses may carry org.apache.hadoop.security.AccessControlException.* which can spill details beneficial to an attacker
  6. The gateway can throttle and rate shape WebHDFS REST requests which can protect the Hadoop cluster from resource consumption from excessive HDFS writes, open file handles and excessive  create, read, update and delete operations which might impact a running job.

This list is just the start, API manager can also perform selective encryption and data protection (such as PCI tokenization or PII format preserving encryption) on data as it is inserted or deleted from the Hadoop cluster, all by sitting in-between the caller and the cluster. So the parlor trick here is really moving the problem from trying to secure hadoop from the inside out to moving and centralizing security to the enforcement point. If you are looking for a way to expose “Big Data” outside the cluster, an the API Gateway model may be worth some investigation!

Blake

 

The post How to secure Hadoop without touching it – combining API Security and Hadoop appeared first on Security Gateways@Intel.

Read the original blog entry...

More Stories By Application Security

This blog references our expert posts on application and web services security.

@BigDataExpo Stories
SOA Software has changed its name to Akana. With roots in Web Services and SOA Governance, Akana has established itself as a leader in API Management and is expanding into cloud integration as an alternative to the traditional heavyweight enterprise service bus (ESB). The company recently announced that it achieved more than 90% year-over-year growth. As Akana, the company now addresses the evolution and diversification of SOA, unifying security, management, and DevOps across SOA, APIs, microser...
SYS-CON Events announced today that Creative Business Solutions will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Creative Business Solutions is the top stocking authorized HP Renew Distributor in the U.S. Based out of Long Island, NY, Creative Business Solutions offers a one-stop shop for a diverse range of products including Proliant, Blade and Industry Standard Servers, Networking, Server Options and...
SYS-CON Events announced today that Cisco, the worldwide leader in IT that transforms how people connect, communicate and collaborate, has been named “Gold Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Cisco makes amazing things happen by connecting the unconnected. Cisco has shaped the future of the Internet by becoming the worldwide leader in transforming how people connect, communicate and collaborat...
Docker is an excellent platform for organizations interested in running microservices. It offers portability and consistency between development and production environments, quick provisioning times, and a simple way to isolate services. In his session at DevOps Summit at 16th Cloud Expo, Shannon Williams, co-founder of Rancher Labs, will walk through these and other benefits of using Docker to run microservices, and provide an overview of RancherOS, a minimalist distribution of Linux designed...
SYS-CON Events announced today that Akana, formerly SOA Software, has been named “Bronze Sponsor” of SYS-CON's 16th International Cloud Expo® New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. Akana’s comprehensive suite of API Management, API Security, Integrated SOA Governance, and Cloud Integration solutions helps businesses accelerate digital transformation by securely extending their reach across multiple channels – mobile, cloud and Internet of Thi...
Businesses are looking to empower employees and departments to do more, go faster, and streamline their processes. For all workers – but mobile workers especially – utilizing the cloud to reconnect documents and improve processes without destructing existing workflows can have a dramatic impact on productivity. In his session at 16th Cloud Expo, Mark Grilli, vice president of Acrobat Solutions marketing at Adobe Systems Incorporated, will outline new ways that the cloud is changing the way peo...
SYS-CON Events announced today that Vitria Technology, Inc. will exhibit at SYS-CON’s @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Vitria will showcase the company’s new IoT Analytics Platform through live demonstrations at booth #330. Vitria’s IoT Analytics Platform, fully integrated and powered by an operational intelligence engine, enables customers to rapidly build and operationalize advanced analytics to deliver timely business outcomes ...
Are your applications getting in the way of your business strategy? It’s time to rethink your IT approach. In his session at 16th Cloud Expo, Madhukar Kumar, Vice President, Product Management at Liaison Technologies, will discuss a new data-centric approach to IT that allows your data, not applications, to inform business strategy. By moving away from an application-centric IT model where data integration and analysis are subservient to the constraints of applications, your organization will b...
SYS-CON Events announced today that QTS Realty Trust, one of the nation’s largest and fastest-growing providers of data center facilities and cloud services and a leader in security and compliance, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. QTS Realty Trust, Inc. (NYSE: QTS) is a leading national provider of data center solutions and fully managed services, and a leader in security and compliance...
SYS-CON Events announced today that Solgenia will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY, and the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Solgenia is the global market leader in Cloud Collaboration and Cloud Infrastructure software solutions. Designed to “Bridge the Gap” between Personal and Professional S...
SYS-CON Events announced today that Liaison Technologies, a leading provider of data management and integration cloud services and solutions, has been named "Silver Sponsor" of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York, NY. Liaison Technologies is a recognized market leader in providing cloud-enabled data integration and data management solutions to break down complex information barriers, enabling enterprises to make sm...
Technology today seems to be moving at breakneck speeds. This speed of change is creating tectonic shifts in how businesses operate and leverage technology to achieve their goals. The convergence of key disruptive technologies (i.e., social, mobile, analytics, and cloud) is what Gartner refers to as the nexus of forces. Cloud is an underpinning of this nexus. How is this disruption impacting the CIO? Does the role change in the face of all these forces, or is it just a continuation of what the C...
Cloud is not a commodity. And no matter what you call it, computing doesn’t come out of the sky. It comes from physical hardware inside brick and mortar facilities connected by hundreds of miles of networking cable. And no two clouds are built the same way. SoftLayer gives you the highest performing cloud infrastructure available. One platform that takes data centers around the world that are full of the widest range of cloud computing options, and then integrates and automates everything. J...
In the midst of the widespread popularity and adoption of cloud computing, it seems like everything is being offered “as a Service” these days: Infrastructure? Check. Platform? You bet. Software? Absolutely. Toaster? It’s only a matter of time. With service providers positioning vastly differing offerings under a generic “cloud” umbrella, it’s all too easy to get confused about what’s actually being offered. In his session at 16th Cloud Expo, Kevin Hazard, Director of Digital Content for SoftL...
Countless business models have spawned from the IaaS industry. Resell Web hosting, blogs, public cloud, and on and on. With the overwhelming amount of tools available to us, it's sometimes easy to overlook that many of them are just new skins of resources we've had for a long time. In his General Session at 16th Cloud Expo, Phil Jackson, Lead Developer Advocate at SoftLayer, will break down what we've got to work with and discuss the benefits and pitfalls to discover how we can best use them t...
The world's leading Cloud event, Cloud Expo has launched Microservices Journal on the SYS-CON.com portal, featuring over 19,000 original articles, news stories, features, and blog entries. DevOps Journal is focused on this critical enterprise IT topic in the world of cloud computing. Microservices Journal offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. Follow new article posts on T...
SYS-CON Media announced that IBM, which offers the world’s deepest portfolio of technologies and expertise that are transforming the future of work, has launched ad campaigns on SYS-CON’s numerous online magazines such as Cloud Computing Journal, Virtualization Journal, SOA World Magazine, and IoT Journal. IBM’s campaigns focus on vendors in the technology marketplace, the future of testing, Big Data and analytics, and mobile platforms.
SYS-CON Events announced today that SafeLogic has been named “Bag Sponsor” of SYS-CON's 16th International Cloud Expo® New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. SafeLogic provides security products for applications in mobile and server/appliance environments. SafeLogic’s flagship product CryptoComply is a FIPS 140-2 validated cryptographic engine designed to secure data on servers, workstations, appliances, mobile devices, and in the Cloud....
SYS-CON Events announced today the IoT Bootcamp – Jumpstart Your IoT Strategy, being held June 9–10, 2015, in conjunction with 16th Cloud Expo and Internet of @ThingsExpo at the Javits Center in New York City. This is your chance to jumpstart your IoT strategy. Combined with real-world scenarios and use cases, the IoT Bootcamp is not just based on presentations but includes hands-on demos and walkthroughs. We will introduce you to a variety of Do-It-Yourself IoT platforms including Arduino, Ras...
When it comes to building applications, one database definitely does not fit all. Traditional SQL databases are great for storing highly structured, normalized data and performing analytics and reporting. NoSQL has attracted developers with its awesome flexibility, and JSON-centric document stores like Cloudant make web developers incredibly productive by offering a JavaScript environment from end-to-end. Recent Big Data challenges have driven the need for a distributed approach to analytics e...