Cloud enables SMBs to access new, scalable resources – previously only available to enterprises – in flexible and cost-effective ways. McKinsey’s SMB Cloud Report projects the public cloud market to reach $40-$50 billion by 2015, with SMBs comprising 65% of public cloud spending in 2015. But selling cloud to SMBs raises the questions of who, what and how.
In this session Manjula Talreja, VP of Cisco’s Global Cloud Business Development Team, will discuss the importance of knowing who SMB...| By Damon Young | Article Rating: |
|
| January 15, 2013 12:00 AM EST | Reads: |
367 |
At ProjectLocker, we operate a polyglot environment with a heavy Ruby bias. While we love Ruby and Rails, one of the drawbacks of Ruby is its Global VM Lock. In a nutshell, the Global VM Lock makes it harder to write Ruby code that can fully utilize a modern multi-core server. For Web applications, this isn’t a problem because the web server manages multiple processes for you (e.g. via Passenger). However, for offline processes, parallelism doesn’t come for free.
I was recently working on a project that involved the offline batch processing of lots of data. This project has been operating successfully for some time, but the data set has grown, causing the process to need more more time to complete than we’d like. So I dove in to see what we could do to speed it up. Fortunately, the process was still single-threaded, so we knew we’d be able to inject concurrency to increase throughput without adding hardware.
The job in question runs on a fairly well-equipped server, but the server was underutilized due to the process being serial. Here’s an outline of the initial code:
def main_job
for retrieve_giant_dataset().each do |item|
long_process(item)
end
summarize_results(retrieve_all_results())
end
def long_process(item)
# Do some work on item that uses a lot of CPU time.
item.save
end
That approach gets the job done, but I wanted to parallelize it. Conceptually, I wanted to transform the main_job method so that it looked something like this:
def main_job
threads = []
for retrieve_giant_dataset().each do |item|
threads << Thread.new(item) do
long_process(item)
end
end
threads.each { |t| t.join }
summarize_results(retrieve_all_results())
end
Unfortunately, it’s not that easy due to the aforementioned Global VM Lock. What I needed was a way to get my threads running on a bunch of independent processes. This is a problem tailor-made for a job queueing system. Enter beanstalkd, a simple & fast work queue. We paired beanstalkd with Stalker, a DSL that makes it easy to queue and process jobs from Ruby. Integrating these two was a cinch. Here’s what the restructured code looks like now:
def main_job
for retrieve_giant_dataset().each do |item|
Stalker.enqueue(JOB_NAME, :id => item.id)
end
beaneater = Beaneater::Pool.new(['localhost:11300'])
tube = beaneater.tubes.find TUBE_NAME
while tube.peek(:ready)
sleep(5)
end
summarize_results(retrieve_all_results())
end
So instead of processing each item during the loop, now we just add each to the beanstalkd queue. Once we finish queueing all of the items, we wait until all of our entries have been processed by the worker processes. The workers are initiated via a jobs.rb file that looks something like this:
include Stalker
job JOB_NAME do |args|
item = ItemClass.find(args['id'])
Worker::long_process(item)
end
We then start beanstalkd and a few worker processes and we’re off to the races. Now our job runs in parallel via multiple processes, and we can tune the number of worker processes we run to consume as much of the machine’s resources as we like. As a bonus, we can also run Stalker workers on other machines in our cluster for added parallelism. With just a few minor tweaks to our code, we’ve gone from single-threaded to a solution that is limited only by the capacity of the shared database used. Sweet!
What about the MapReduce reference in the title of this post? The MapReduce algorithm basically has two steps. In the Map step, you divide the work and assign it to worker nodes. The Reduce step simply combines the results of each individual node’s computation into an aggregate result. In our solution here, the Map step is done by us enqueuing our jobs into beanstalkd and then beanstalkd making the jobs available for consumption by our nodes. Our database serves to communicate the details of the jobs, and stands in for a shared filesystem like the HDFS used by Hadoop. I didn’t go into detail about this step, but our Reduce is also assisted by database aggregates; we’re able to construct a few simple queries that get us what we want from the database.
So there it is, distributed MapReduce for Ruby using beanstalkd, Stalker, and a healthy database. This is probably not the best solution if you need to scale to thousands or tens of thousands of workers. But if you just need to get tens of workers running in parallel quickly, you may be able to adapt this approach to fit your needs.
Read the original blog entry...
Published January 15, 2013 Reads 367
Copyright © 2013 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Damon Young
Damon Young is Director of Sales at ProjectLocker.com. ProjectLocker was founded in 2003 to provide on-demand tools for software developers. Guided by the simple mission of helping companies build better software, ProjectLocker's services have expanded to include services for the complete lifecycle of software projects, from requirements documentation to build and test automation. ProjectLocker serves companies from startups to Fortune 1000 multinationals.
Cloud enables SMBs to access new, scalable resources – previously only available to enterprises – in flexible and cost-effective ways. McKinsey’s SMB Cloud Report projects the public cloud market to reach $40-$50 billion by 2015, with SMBs comprising 65% of public cloud spending in 2015. But selling cloud to SMBs raises the questions of who, what and how.
In this session Manjula Talreja, VP of Cisco’s Global Cloud Business Development Team, will discuss the importance of knowing who SMB...May. 19, 2013 06:00 AM EDT Reads: 951 |
By Jeremy Geelan The economics of business are radically changing due to the way in which software and services are being delivered thanks to cloud computing. In his session at 12th Cloud Expo | Cloud Expo New York [10-13 June, 2013], Mike Kavis will cover six reasons for the disruption.May. 19, 2013 02:15 AM EDT Reads: 4,051 |
By Jeremy Geelan Our more interconnected planet is accelerating the adoption and convergence of next-generation architectures, in the form of cloud, mobile and instrumented physical assets. Organizations that can effectively balance optimization and innovation, will be in a position to leverage new systems of engagement, out maneuver their peers and achieve desired outcomes. In the Opening Keynote at 12th Cloud Expo | Cloud Expo New York, IBM GM & Next Generation Platform CTO Dr Danny Sabbah will detail the crit...May. 19, 2013 01:00 AM EDT Reads: 2,794 |
By Jeremy Geelan The massive computing and storage resources that are needed to support big data applications make cloud environments an ideal fit. In Nati Shalom's upcoming session at 12th Cloud Expo | Cloud Expo New York [June 10-13, 2013], you'll learn how to build your big data "database on-demand" using MongoDB, Cassandra, Solr, MySQL, or any other big data solution, as well as manage your big data application using a new open source framework called “Cloudify.” All this, on top of the OpenStack cloud. May. 18, 2013 08:00 PM EDT Reads: 2,343 |
By Pat Romanski SYS-CON Events announced today that MetraTech Corp., the leading provider of agreements-based billing™, commerce and compensation solutions, has been named “Bronze Sponsor” of SYS-CON's 12th International Cloud Expo, which will take place on June 10–13, 2013, at the Javits Center in New York City, New York.
MetraTech Corp. is the leading provider of commerce, billing and compensation solutions enabling customers to monetize relationships with customers, partners, and suppliers. Its unique Agree...May. 18, 2013 04:00 PM EDT Reads: 1,374 |
By Liz McMillan “Trust is an ongoing journey and sits at the foundation of any vendor relationship – the companies that don’t consistently earn trust won’t be around long,” noted Henrik Rosendahl, Senior VP of Cloud Solutions at Quantum, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “As they do more with cloud, trust will organically grow – maybe it’s just about meeting SLAs or seeing firsthand that data is there when you need it,” Rosendahl continued.
Cloud Computing Journal: The move ...May. 18, 2013 04:00 PM EDT Reads: 1,414 |
By Jeremy Geelan May. 18, 2013 04:00 PM EDT Reads: 3,119 |
By Liz McMillan Cloud computing is more than a buzz-phrase it’s a transformative IT paradigm shift. The emphasis in the cloud is on elasticity, scalability, agility and open. Not just open standards but open APIs and open source. The delivery of software is also going through a paradigm shift. Open source software was often a commoditization of a market leader; Unix to Linux or Oracle to MySQL what’s changing is that the iterative nature, user context and the motto of releasing early and often are driving real ...May. 18, 2013 03:15 PM EDT Reads: 1,400 |
By Elizabeth White In an ideal developer/systems administrator’s world, most applications would deploy seamlessly to multiple platforms and scale elastically with minimal effort bringing the unprecedented agility of the cloud within immediate reach of developer teams and IT organizations.
OpenStack, a RackSpace and NASA initiative, is now managed by an independent foundation and is supported by multiple vendors. It defines APIs for compute, storage, networking, services, monitoring, and additional infrastructure...May. 18, 2013 02:00 PM EDT Reads: 1,315 |
By Elizabeth White Storage and Archive offerings are now exploding on the market. From end-user mobile devices to company tactical level, the cloud has become a black hole for every kind of data. But what are the risks, and what are the real needs?
In his session at the 12th International Cloud Expo, Alexandre Morel, Cloud Product Manager & Evangelist at OVH.com, will answer questions such as:
How to develop a strategy to use those offers as a base to develop mid and long-term value?
Should companies trust th...May. 18, 2013 01:00 PM EDT Reads: 1,350 |
- Cloud Expo New York: Cloud Is Changing the Economics of Business
- Cloud Expo New York Speaker Profile: Nicos Vekiarides – TwinStrata
- AMD and Adobe Collaborate on Upcoming Version of Adobe Premiere Pro Software to Enable Breakthrough Video Editing Performance Through Open Standards
- Windows Azure IaaS Reaches General Availability
- Cloud Expo New York: Deploying Hybrid Cloud for Performance and Uptime
- Big Data Isn’t About the Database, It’s About the Application
- Basho Announces Open Source Riak CS and General Availability of Riak CS Enterprise v1.3
- Cloudant to Exhibit at Cloud Expo & Big Data Expo New York
- Cloud Expo New York: Rethink IT and Reinvent Business with IBM SmartCloud
- Predixion Software Announces General Availability of the Latest Version of its Predictive Analytics Platform
- The Accessibility of the Cloud
- Cloud Expo New York | Danger Ahead: Why File Sync Is NOT Endpoint Backup
- Cloud Expo New York: Best CIO Practices Shared from SHI’s Customers
- Examining the True Cost of Big Data
- Cloud Expo New York: Cloud Is Changing the Economics of Business
- Cloud Expo New York: How to Use Google Apps Script
- Cloud Expo New York Speaker Profile: Nicos Vekiarides – TwinStrata
- AMD and Adobe Collaborate on Upcoming Version of Adobe Premiere Pro Software to Enable Breakthrough Video Editing Performance Through Open Standards
- Windows Azure IaaS Reaches General Availability
- Rackspace Hosting Named “Platinum Plus Sponsor” of Cloud Expo New York
- The Cover and the Epilogue of the Upcoming Book
- Cloud Expo New York: Why Big Data Is Really About Small Data
- Scripps Networks Interactive’s Popular Lifestyle Shows from HGTV, DIY Network, Food Network, Cooking Channel and Travel Channel Coming to Prime Instant Video and Amazon Instant Video
- Cloud Expo New York: Deploying Hybrid Cloud for Performance and Uptime
- Cloud Expo New York: Best CIO Practices Shared from SHI’s Customers
- Cloud Computing and Big Data in 2013: What's Coming Next?
- Think You Heard It All About The Best of the Best from CES? Well, Think Again ... My eHome® -- the Gotta-Have-It Multi-Play Solution -- Targeted for Launch in First Quarter 2014
- Examining the True Cost of Big Data
- Cloud Expo New York: Cloud Is Changing the Economics of Business
- Best Practices: The Role of API Management
- OpenFeint Co-Founder Peter Relan Launches OpenKit: A Backend-as-a-Service for Cross Platform Mobile Developers Seeking Cloud Data Storage, Leaderboards, Social Network Integration and More
- Cloud Expo New York: How to Use Google Apps Script
- MapR Technologies' Senior Principal Technologist to Present at the Upcoming Telecom Analytics Conference
- Cloud Expo New York Speaker Profile: Nicos Vekiarides – TwinStrata
- DataStax Announces Community Edition 1.2 -- Latest Version of Apache Cassandra(TM) Includes Free Version of OpsCenter, the #1 Visual Management and Monitoring Solution for Cassandra
- AMD and Adobe Collaborate on Upcoming Version of Adobe Premiere Pro Software to Enable Breakthrough Video Editing Performance Through Open Standards








The economics of business are radically changing due to the way in which software and services are being delivered thanks to cloud computing. In his session at 12th Cloud Expo | Cloud Expo New York [10-13 June, 2013], Mike Kavis will cover six reasons for the disruption.
Our more interconnected planet is accelerating the adoption and convergence of next-generation architectures, in the form of cloud, mobile and instrumented physical assets. Organizations that can effectively balance optimization and innovation, will be in a position to leverage new systems of engagement, out maneuver their peers and achieve desired outcomes. In the Opening Keynote at 12th Cloud Expo | Cloud Expo New York, IBM GM & Next Generation Platform CTO Dr Danny Sabbah will detail the crit...
The massive computing and storage resources that are needed to support big data applications make cloud environments an ideal fit. In Nati Shalom's upcoming session at 12th Cloud Expo | Cloud Expo New York [June 10-13, 2013], you'll learn how to build your big data "database on-demand" using MongoDB, Cassandra, Solr, MySQL, or any other big data solution, as well as manage your big data application using a new open source framework called “Cloudify.” All this, on top of the OpenStack cloud.
SYS-CON Events announced today that MetraTech Corp., the leading provider of agreements-based billing™, commerce and compensation solutions, has been named “Bronze Sponsor” of SYS-CON's 12th International Cloud Expo, which will take place on June 10–13, 2013, at the Javits Center in New York City, New York.
MetraTech Corp. is the leading provider of commerce, billing and compensation solutions enabling customers to monetize relationships with customers, partners, and suppliers. Its unique Agree...
“Trust is an ongoing journey and sits at the foundation of any vendor relationship – the companies that don’t consistently earn trust won’t be around long,” noted Henrik Rosendahl, Senior VP of Cloud Solutions at Quantum, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “As they do more with cloud, trust will organically grow – maybe it’s just about meeting SLAs or seeing firsthand that data is there when you need it,” Rosendahl continued.
Cloud Computing Journal: The move ...
Cloud computing is more than a buzz-phrase it’s a transformative IT paradigm shift. The emphasis in the cloud is on elasticity, scalability, agility and open. Not just open standards but open APIs and open source. The delivery of software is also going through a paradigm shift. Open source software was often a commoditization of a market leader; Unix to Linux or Oracle to MySQL what’s changing is that the iterative nature, user context and the motto of releasing early and often are driving real ...
In an ideal developer/systems administrator’s world, most applications would deploy seamlessly to multiple platforms and scale elastically with minimal effort bringing the unprecedented agility of the cloud within immediate reach of developer teams and IT organizations.
OpenStack, a RackSpace and NASA initiative, is now managed by an independent foundation and is supported by multiple vendors. It defines APIs for compute, storage, networking, services, monitoring, and additional infrastructure...
Storage and Archive offerings are now exploding on the market. From end-user mobile devices to company tactical level, the cloud has become a black hole for every kind of data. But what are the risks, and what are the real needs?
In his session at the 12th International Cloud Expo, Alexandre Morel, Cloud Product Manager & Evangelist at OVH.com, will answer questions such as:
How to develop a strategy to use those offers as a base to develop mid and long-term value?
Should companies trust th...
We all talk about cloud differently, but is there a way we should be speaking about this tech?
Cloud computing is now a widely reported, if not accepted, IT movement that, depending on who you talk to, has changed or is changing the way businesses utilize infrastructure.
A recent Gartner study states that the function of the modern CIO is in flux and that his or her future focus must incorporate digital assets (aka cloud-based data and applications) to remain relevant. Towards the goal of riding the sea change a compiler of stacks to a broker of business needs, secu...
New technologies allow schools, colleges and universities to analyze absolutely everything that happens. From student behavior, testing results, career development of students as well as educational needs based on changing societies. A lot of this data has already been stored and is used for statist...
In the coming years, big data will change the way organisations and societies are operated and managed. Big data however, is not the only trend that will impact significantly how organisations operate. Another major trend at the moment is gamification. Gamification will change the way organisations ...
The age of data center automation is upon us. Whether it's cloud or SDN or devops in general, automation as a means to achieve efficiency and, one hopes, free up resources that can be then redirected to focus on innovation.
As is always the case when we begin to move further upwards, abstracting ...
Windows Azure Virtual Networks offers the power to open up several cross-premises use case scenarios, including Active Directory Disaster Recovery, SQL Database Replication, Windows Server 2012 DFS-R File Replication, Accelerated Cloud File Services with BranchCache, Hybrid Web Applications and MORE...
As the infrastructure cloud market (IaaS and PaaS) continues to grow rapidly, we are seeing quite a few customers who are delivering an application – whether it is a mission-critical or SaaS application – and basing their solution on VMware.
VMware Security Cloud Encryption cloud keyboard Cloud Enc...
Have you heard of products like IBM’s InfoSphere Streams, Tibco’s Event Processing product, or Oracle’s CEP product? All good examples of commercially available stream processing technologies which help you process events in real-time.
I’ve been asked what I consider as “Big Data” versus “Small Dat...
My fellow Technical Evangelists and I have authored a content series that steps through building your very own Private Cloud by leveraging Windows Server 2012, our FREE Hyper-V Server 2012, Windows Azure Infrastructure Services ( IaaS ) and System Center 2012 Service Pack 1.
Week-by-week, we walk ...













