Welcome!

Big Data Journal Authors: Carmen Gonzalez, Elizabeth White, Esmeralda Swartz, Dana Gardner, Liz McMillan

Related Topics: Big Data Journal, Open Source

Big Data Journal: Blog Feed Post

Big Data: New Ways to Hadoop with R

Today, there are two main ways to use Hadoop with R and big data

Today, there are two main ways to use Hadoop with R and big data:

1. Use the open-source rmr package to write map-reduce tasks in R (running within the Hadoop cluster - great for data distillation!)

2. Import data from Hadoop to a server running Revolution R Enterprise, via Hbase, ODBC (for high-performance Hadoop/SQL interfaces), or streaming data direct from HDFS to ScaleR's big-data predictive algorithms.


And now, there are even more Hadoop platforms supported for use with Revolution R Enterprise. You can use:

  • Cloudera CDH3 or CDH4
  • IBM BigInsights 2
  • New! Hortonworks Data Platform 1.2
  • New! Intel's Distribution for Hadoop (announced today)

And by the end of the year, there will be a third way to use Hadoop with R:

3. Leave the data in Hadoop, and use ScaleR's "in-Hadoop predictive analytics"

We announced today that we are jointly developing in-Hadoop predictive analytics with HortonWorks, and our first demonstrations are taking place now at the Strata conference. It's in the prototype stage right now, but we expect to have it generally available by the end of the year. In the meantime, check out the video below which explains the three ways of using R and Hadoop together, and includes an early demo of our in-Hadoop Predictive Analytics.

For more details, check out the press release below.

Revolution Analytics press releases: Revolution Analytics Expands Support for Hadoop and Pioneers In-Hadoop Predictive Analytics with Hortonworks

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Latest Stories from Big Data Journal
General Electric (GE) has been a household name for more than a century, thanks in large part to its role in making households easier to run. Starting with the light bulb invented by its founder, Thomas Edison, GE has been selling devices (“things”) to consumers throughout its 122-year history. Last week, GE announced that it is officially leaving that job to others. While the lighting division will stay, GE will now turn its attention to selling industrial machinery and analytics as a service t...
Having just joined a large technology company with 20 years of history, it would be suicidal to believe that I can immediately move the entire organization to the DevOps mindset and model. For those not familiar with the term, “Eventual Consistency” is a model used in distributed computing to ensure high availability. In this context, it’s a model for replicating best practices and automation across IT teams and business units. The logical place to start with automation is the on-boarding of a ...
All major researchers estimate there will be tens of billions devices – computers, smartphones, tablets, and sensors – connected to the Internet by 2020. This number will continue to grow at a rapid pace for the next several decades. With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo in Silicon Valley. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be!...
The Open Group and BriefingsDirect recently assembled a distinguished panel at The Open Group Boston Conference 2014 to explore the practical implications and limits of the Internet of Things. This so-called Internet of Things means more data, more cloud connectivity and management, and an additional tier of “things” that are going to be part of the mobile edge -- and extending that mobile edge ever deeper into even our own bodies. Yet the Internet of Things is more than the “things” – it me...
The emergence of cloud computing and Big Data warrants a greater role for the PMO to successfully manage enterprise transformation driven by these powerful trends. As the adoption of cloud-based services continues to grow, a governance model is needed to orchestrate enterprise cloud implementations and harness the power of Big Data analytics. In his session at 15th Cloud Expo, Mahesh Singh, President of BigData, Inc., to discuss how the Enterprise PMO takes center stage not only in developing th...
Come learn about what you need to consider when moving your data to the cloud. In her session at 15th Cloud Expo, Skyla Loomis, a Program Director of Cloudant Development at Cloudant, will discuss the security, performance, and operational implications of keeping your data on premise, moving it to the cloud, or taking a hybrid approach. She will use real customer examples to illustrate the tradeoffs, key decision points, and how to be successful with a cloud or hybrid cloud solution.
For the last hundred years, the desk phone has been a staple of every business. The landline has been a lifeline to customers and colleagues as the primary means of communication – even as email threatened to render the telephone obsolete. For some purposes, like conference calling, there was simply no substitute. That is, until a few years ago. With all due respect and apologies to Mr. Alexander Graham Bell, the desk phone is becoming just one solution, out of many devices, used for the modern...
Software is eating the world. Companies that were not previously in the technology space now find themselves competing with Google and Amazon on speed of innovation. As the innovation cycle accelerates, companies must embrace rapid and constant change to both applications and their infrastructure, and find a way to deliver speed and agility of development without sacrificing reliability or efficiency of operations. In her keynote DevOps Summit, Victoria Livschitz, CEO of Qubell, will discuss ho...
In today's application economy, enterprise organizations realize that it's their applications that are the heart and soul of their business. If their application users have a bad experience, their revenue and reputation are at stake. In his session at 15th Cloud Expo, Anand Akela, Senior Director of Product Marketing for Application Performance Management at CA Technologies, will discuss how a user-centric Application Performance Management solution can help inspire your users with every appli...
Enthusiasm for the Internet of Things has reached an all-time high. In 2013 alone, venture capitalists spent more than $1 billion dollars investing in the IoT space. With “smart” appliances and devices, IoT covers wearable smart devices, cloud services to hardware companies. Nest, a Google company, detects temperatures inside homes and automatically adjusts it by tracking its user’s habit. These technologies are quickly developing and with it come challenges such as bridging infrastructure gaps,...