The massive computing and storage resources that are needed to support big data applications make cloud environments an ideal fit. In Nati Shalom's upcoming session at 12th Cloud Expo | Cloud Expo New York [June 10-13, 2013], you'll learn how to build your big data "database on-demand" using MongoDB, Cassandra, Solr, MySQL, or any other big data solution, as well as manage your big data application using a new open source framework called “Cloudify.” All this, on top of the OpenStack cloud. | By Jason Bloomberg | Article Rating: |
|
| February 28, 2013 08:00 AM EST | Reads: |
1,579 |
Scenario #1: out of the blue, your boss calls, looking for some long-forgotten entry in a spreadsheet from 1989. Where do you look? Or consider scenario #2: said boss calls again, only this time she wants you to analyze customer purchasing behavior...going back to 1980. Similar problem, only instead of finding a single datum, you must find years of ancient information and prepare it for analysis with a modern business intelligence tool.
The answer, of course, is archiving. Fortunately, you (or your predecessor, or predecessor's predecessor) have been archiving important-or potentially important-corporate data since your organization first started using computers back in the 1960s. So all you have to do to keep your boss happy is find the appropriate archives, recover the necessary data, and you're good to go, right?

Not so fast. There are a number of gotchas to this story, some more obvious than others. Cloud to the rescue? Perhaps, but many archiving challenges remain, and the Cloud actually introduces some new speed bumps as well. Now factor in Big Data. Sure, Big Data are big, so archiving Big Data requires a big archive. Lucky you-vendors have already been knocking on your door peddling Big Data archiving solutions. Now can you finally breathe easy? Maybe, maybe not. Here's why.
Archiving: The Long View
So much of our digital lives have taken place over the last twenty years or so that we forget that digital computing dates back to the 1940s-and furthermore, we forget that this sixty-odd year lifetime of the Information Age is really only the first act of perhaps centuries of computing before humankind either evolves past zeroes and ones altogether or kills itself off in the process. Our technologies for archiving information, however, are woefully shortsighted, for several reasons:
- Hardware obsolescence (three to five years) - Using a hard drive or tape drive for archiving? It won't be long till the hardware is obsolete. You may get more life out of the gear you own, but one it wears out, you'll be stuck. Anyone who archived to laser disk in the 1980s has been down this road.
- File format obsolescence (five to ten years) - True, today's Office products can probably read that file originally saved in the Microsoft Excel version 1 file format back in the day, but what about those VisiCalc or Lotus 123 files? Tools that will convert such files to their modern equivalents will eventually grow increasingly scarce, and you always risk the possibility that they won't handle the conversion properly, leading to data corruption. If your data are encrypted, then your encryption format falls into the file format obsolescence bucket as well. And what about the programs themselves? From simple spreadsheet formulas to complex legacy spaghetti code, how do you archive algorithms in an obsolescence-proof format?
- Media obsolescence (ten to fifteen years) - CD-ROMs and digital backup tapes have an expected lifetime. Keeping them cool and dry can extend their life, but actually using them will shorten it. Do you really want to rely upon a fifteen-year-old backup tape for critical information?
- Computing paradigm obsolescence (fifty years perhaps; it's anybody's guess) - will quantum computing or biological processors or some other futuristic gear drive binary digital technologies into the Stone Age? Only time will tell. But if you are forward thinking enough to archive information for the 22nd century, there's no telling what you'll need to do to maintain the viability of your archives in a post-binary world.
Cloud to the Rescue?
On the surface, letting your Cloud Service Provider (CSP) archive your data solves many of these issues. Not only are the new archiving services like Amazon Glacier impressively cost-effective, but we can feel reasonably comfortable counting on today's CSPs to migrate our data from one hardware/media platform to the next over time as technology advances. So, can Cloud solve all your archiving issues?
At some point the answer may be yes, but Cloud Computing is still far too immature to jump to such a conclusion. Will your CSP still be in business decades from now? As the CSP market undergoes its inevitable consolidation phase, will the new CSP who bought out your old CSP handle your archive properly? Only time will tell.
But even if the CSPs rise to the archiving challenge, you may still have the file format challenge. Sure, archiving those old Lotus 123 files in the Cloud is a piece of cake, but that doesn't mean that your CSP will return them in Excel version 21.3 format ten years hence-an unfortunate and unintentional example of garbage in the Cloud.
The Big Data Old Tail
You might think that the challenges inherent in archiving Big Data are simply a matter of degree: bigger storage for bigger data sets, right? But thinking of Big Data as little more than extra-large data sets misses the big picture of the importance of Big Data.
The point to Big Data is that the indicated data sets continue to grow in size on an ongoing basis, continually pushing the limits of existing technology. The more capacity available for storage and processing, the larger the data sets we end up with. In other words, Big Data are by definition a moving target.
One familiar estimate states that the quantity of data in the world doubles every two years. Your organization's Big Data may grow somewhat faster or slower than this convenient benchmark, but in any case, the point is that Big Data growth is exponential. So, taking the two-year doubling factor as a rule of thumb, we can safely say that at any point in time, half of your Big Data are less than two years old, while the other half of your Big Data are more than two years old. And of course, this ZapFlash is concerned with the older half.
The Big Data archiving challenge, therefore, is breaking down the more-than-two-years-old Big Data sets. Remember that this two-year window is true at any point in time. Thinking about the problem mathematically, then, you can conclude that a quarter of your Big Data are more than four years old, an eighth are more than six years old, etc.
Combine this math with the lesson of the first part of this ZapFlash, and a critical point emerges: byte for byte, the cost of maintaining usable archives increases the older those archives become. And yet, the relative size of those archives is vanishingly small relative to today's and tomorrow's Big Data. Furthermore, this problem will only get worse over time, because the size of the Old Tail continues to grow exponentially.
We call this Big Data archiving problem the Big Data Old Tail. Similar to the Long Tail argument, which focuses on the value inherent in summing up the Long Tail of customer demand for niche products, the Big Data Old Tail focuses on the costs inherent in maintaining archives of increasingly small, yet increasingly costly data as we struggle to deal with older and older information. True, perhaps the fact that the Old Tail data sets from a particular time period are small will compensate for the fact that they are costly to archive, but remember that the Old Tail continues to grow over time. Unless we deal with the Old Tail, it threatens to overwhelm us.
The ZapThink Take
The obvious question that comes to mind is whether we need to save all those old data sets anyway. After all, who cares about, say, purchasing data from 1982? And of course, you may have a business reason for deleting old information. Since information you preserve may be subject to lawsuits or other unpleasantness, you may wish to delete data once it's legal to do so.
Fair enough. But there are perhaps far more examples of Big Data sets that your organization will wish to preserve indefinitely than data sets you're happy to delete. From scientific data to information on market behavior to social trends, the richness of our Big Data do not simply depend on the information from the last year or two or even ten. After all, if we forget the mistakes of the past then we are doomed to repeat them. Crunching today's Big Data can give us business intelligence, but only by crunching yesterday's Big Data as well can we ever expect to glean wisdom from our information.
Published February 28, 2013 Reads 1,579
Copyright © 2013 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Jason Bloomberg
Jason Bloomberg is President of ZapThink, a Dovèl Technologies Company. He is a thought leader in the areas of Enterprise Architecture, Service-Oriented Architecture, and Cloud Computing, and helps organizations around the world better leverage their IT resources to meet changing business needs. He is a frequent speaker, prolific writer, and pundit.
Mr. Bloomberg is one of the original Managing Partners of ZapThink LLC, the leading SOA advisory and analysis firm, which was acquired by Dovèl Technologies in August 2011. His book, Service Orient or Be Doomed! How Service Orientation Will Change Your Business (John Wiley & Sons, 2006, coauthored with Ron Schmelzer), is recognized as the leading business book on Service Orientation.
Mr. Bloomberg has a diverse background in eBusiness technology management and industry analysis, including serving as a senior analyst in IDC’s eBusiness Advisory group, as well as holding eBusiness management positions at USWeb/CKS (later marchFIRST) and WaveBend Solutions (now Hitachi Consulting). He also co-authored the books XML and Web Services Unleashed (SAMS Publishing, 2002), and Web Page Scripting Techniques (Hayden Books, 1996).
The massive computing and storage resources that are needed to support big data applications make cloud environments an ideal fit. In Nati Shalom's upcoming session at 12th Cloud Expo | Cloud Expo New York [June 10-13, 2013], you'll learn how to build your big data "database on-demand" using MongoDB, Cassandra, Solr, MySQL, or any other big data solution, as well as manage your big data application using a new open source framework called “Cloudify.” All this, on top of the OpenStack cloud. May. 18, 2013 08:00 PM EDT Reads: 2,312 |
By Pat Romanski SYS-CON Events announced today that MetraTech Corp., the leading provider of agreements-based billing™, commerce and compensation solutions, has been named “Bronze Sponsor” of SYS-CON's 12th International Cloud Expo, which will take place on June 10–13, 2013, at the Javits Center in New York City, New York.
MetraTech Corp. is the leading provider of commerce, billing and compensation solutions enabling customers to monetize relationships with customers, partners, and suppliers. Its unique Agree...May. 18, 2013 04:00 PM EDT Reads: 1,322 |
By Liz McMillan “Trust is an ongoing journey and sits at the foundation of any vendor relationship – the companies that don’t consistently earn trust won’t be around long,” noted Henrik Rosendahl, Senior VP of Cloud Solutions at Quantum, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “As they do more with cloud, trust will organically grow – maybe it’s just about meeting SLAs or seeing firsthand that data is there when you need it,” Rosendahl continued.
Cloud Computing Journal: The move ...May. 18, 2013 04:00 PM EDT Reads: 1,359 |
By Jeremy Geelan May. 18, 2013 04:00 PM EDT Reads: 3,095 |
By Liz McMillan Cloud computing is more than a buzz-phrase it’s a transformative IT paradigm shift. The emphasis in the cloud is on elasticity, scalability, agility and open. Not just open standards but open APIs and open source. The delivery of software is also going through a paradigm shift. Open source software was often a commoditization of a market leader; Unix to Linux or Oracle to MySQL what’s changing is that the iterative nature, user context and the motto of releasing early and often are driving real ...May. 18, 2013 03:15 PM EDT Reads: 1,356 |
By Elizabeth White In an ideal developer/systems administrator’s world, most applications would deploy seamlessly to multiple platforms and scale elastically with minimal effort bringing the unprecedented agility of the cloud within immediate reach of developer teams and IT organizations.
OpenStack, a RackSpace and NASA initiative, is now managed by an independent foundation and is supported by multiple vendors. It defines APIs for compute, storage, networking, services, monitoring, and additional infrastructure...May. 18, 2013 02:00 PM EDT Reads: 1,241 |
By Elizabeth White Storage and Archive offerings are now exploding on the market. From end-user mobile devices to company tactical level, the cloud has become a black hole for every kind of data. But what are the risks, and what are the real needs?
In his session at the 12th International Cloud Expo, Alexandre Morel, Cloud Product Manager & Evangelist at OVH.com, will answer questions such as:
How to develop a strategy to use those offers as a base to develop mid and long-term value?
Should companies trust th...May. 18, 2013 01:00 PM EDT Reads: 1,305 |
By Jeremy Geelan Organizations across the world are increasingly starting to see the benefits of moving more and more services to the cloud. The focus on the cost-saving potential of cloud is rapidly shifting to completely transforming the business with cloud. As organizations are investing enormous sums on technology they are starting to realize that in order to maximize the return on investment and accelerate the business transformation process the first area of focus should be people. By ensuring the organiza...May. 18, 2013 01:00 PM EDT Reads: 1,369 |
By Jeremy Geelan May. 18, 2013 12:00 PM EDT Reads: 2,735 |
By Pat Romanski Companies around the world are collecting massive amounts of data everyday that’s sitting around and not being utilized. Take for example the fact that companies collect demographic and location-based data via mobile devices all the time, but have to figure out how to monetize that data.
In his session at the 12th International Cloud Expo, Jason Hoffman, CTO & Founder of Joyent, will examine the state of Big Data, taking a look at what we're doing now to discussing what's on the horizon, as co...May. 18, 2013 12:00 PM EDT Reads: 1,257 |
- Cloud Expo New York: Cloud Is Changing the Economics of Business
- Cloud Expo New York Speaker Profile: Nicos Vekiarides – TwinStrata
- AMD and Adobe Collaborate on Upcoming Version of Adobe Premiere Pro Software to Enable Breakthrough Video Editing Performance Through Open Standards
- Windows Azure IaaS Reaches General Availability
- Cloud Expo New York: Deploying Hybrid Cloud for Performance and Uptime
- Big Data Isn’t About the Database, It’s About the Application
- Basho Announces Open Source Riak CS and General Availability of Riak CS Enterprise v1.3
- Cloudant to Exhibit at Cloud Expo & Big Data Expo New York
- Cloud Expo New York: Rethink IT and Reinvent Business with IBM SmartCloud
- The Accessibility of the Cloud
- Cloud Expo New York | Danger Ahead: Why File Sync Is NOT Endpoint Backup
- Predixion Software Announces General Availability of the Latest Version of its Predictive Analytics Platform
- Cloud Expo New York: Best CIO Practices Shared from SHI’s Customers
- Examining the True Cost of Big Data
- Cloud Expo New York: Cloud Is Changing the Economics of Business
- Cloud Expo New York: How to Use Google Apps Script
- Cloud Expo New York Speaker Profile: Nicos Vekiarides – TwinStrata
- AMD and Adobe Collaborate on Upcoming Version of Adobe Premiere Pro Software to Enable Breakthrough Video Editing Performance Through Open Standards
- Windows Azure IaaS Reaches General Availability
- Rackspace Hosting Named “Platinum Plus Sponsor” of Cloud Expo New York
- The Cover and the Epilogue of the Upcoming Book
- Cloud Expo New York: Why Big Data Is Really About Small Data
- Cloud Expo New York: Deploying Hybrid Cloud for Performance and Uptime
- Scripps Networks Interactive’s Popular Lifestyle Shows from HGTV, DIY Network, Food Network, Cooking Channel and Travel Channel Coming to Prime Instant Video and Amazon Instant Video
- Cloud Expo New York: Best CIO Practices Shared from SHI’s Customers
- Cloud Computing and Big Data in 2013: What's Coming Next?
- Think You Heard It All About The Best of the Best from CES? Well, Think Again ... My eHome® -- the Gotta-Have-It Multi-Play Solution -- Targeted for Launch in First Quarter 2014
- Examining the True Cost of Big Data
- Cloud Expo New York: Cloud Is Changing the Economics of Business
- Best Practices: The Role of API Management
- OpenFeint Co-Founder Peter Relan Launches OpenKit: A Backend-as-a-Service for Cross Platform Mobile Developers Seeking Cloud Data Storage, Leaderboards, Social Network Integration and More
- Cloud Expo New York: How to Use Google Apps Script
- MapR Technologies' Senior Principal Technologist to Present at the Upcoming Telecom Analytics Conference
- Cloud Expo New York Speaker Profile: Nicos Vekiarides – TwinStrata
- DataStax Announces Community Edition 1.2 -- Latest Version of Apache Cassandra(TM) Includes Free Version of OpsCenter, the #1 Visual Management and Monitoring Solution for Cassandra
- AMD and Adobe Collaborate on Upcoming Version of Adobe Premiere Pro Software to Enable Breakthrough Video Editing Performance Through Open Standards








SYS-CON Events announced today that MetraTech Corp., the leading provider of agreements-based billing™, commerce and compensation solutions, has been named “Bronze Sponsor” of SYS-CON's 12th International Cloud Expo, which will take place on June 10–13, 2013, at the Javits Center in New York City, New York.
MetraTech Corp. is the leading provider of commerce, billing and compensation solutions enabling customers to monetize relationships with customers, partners, and suppliers. Its unique Agree...
“Trust is an ongoing journey and sits at the foundation of any vendor relationship – the companies that don’t consistently earn trust won’t be around long,” noted Henrik Rosendahl, Senior VP of Cloud Solutions at Quantum, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “As they do more with cloud, trust will organically grow – maybe it’s just about meeting SLAs or seeing firsthand that data is there when you need it,” Rosendahl continued.
Cloud Computing Journal: The move ...
Cloud computing is more than a buzz-phrase it’s a transformative IT paradigm shift. The emphasis in the cloud is on elasticity, scalability, agility and open. Not just open standards but open APIs and open source. The delivery of software is also going through a paradigm shift. Open source software was often a commoditization of a market leader; Unix to Linux or Oracle to MySQL what’s changing is that the iterative nature, user context and the motto of releasing early and often are driving real ...
In an ideal developer/systems administrator’s world, most applications would deploy seamlessly to multiple platforms and scale elastically with minimal effort bringing the unprecedented agility of the cloud within immediate reach of developer teams and IT organizations.
OpenStack, a RackSpace and NASA initiative, is now managed by an independent foundation and is supported by multiple vendors. It defines APIs for compute, storage, networking, services, monitoring, and additional infrastructure...
Storage and Archive offerings are now exploding on the market. From end-user mobile devices to company tactical level, the cloud has become a black hole for every kind of data. But what are the risks, and what are the real needs?
In his session at the 12th International Cloud Expo, Alexandre Morel, Cloud Product Manager & Evangelist at OVH.com, will answer questions such as:
How to develop a strategy to use those offers as a base to develop mid and long-term value?
Should companies trust th...
Organizations across the world are increasingly starting to see the benefits of moving more and more services to the cloud. The focus on the cost-saving potential of cloud is rapidly shifting to completely transforming the business with cloud. As organizations are investing enormous sums on technology they are starting to realize that in order to maximize the return on investment and accelerate the business transformation process the first area of focus should be people. By ensuring the organiza...
Companies around the world are collecting massive amounts of data everyday that’s sitting around and not being utilized. Take for example the fact that companies collect demographic and location-based data via mobile devices all the time, but have to figure out how to monetize that data.
In his session at the 12th International Cloud Expo, Jason Hoffman, CTO & Founder of Joyent, will examine the state of Big Data, taking a look at what we're doing now to discussing what's on the horizon, as co...
We all talk about cloud differently, but is there a way we should be speaking about this tech?
Cloud computing is now a widely reported, if not accepted, IT movement that, depending on who you talk to, has changed or is changing the way businesses utilize infrastructure.
A recent Gartner study states that the function of the modern CIO is in flux and that his or her future focus must incorporate digital assets (aka cloud-based data and applications) to remain relevant. Towards the goal of riding the sea change a compiler of stacks to a broker of business needs, secu...
New technologies allow schools, colleges and universities to analyze absolutely everything that happens. From student behavior, testing results, career development of students as well as educational needs based on changing societies. A lot of this data has already been stored and is used for statist...
In the coming years, big data will change the way organisations and societies are operated and managed. Big data however, is not the only trend that will impact significantly how organisations operate. Another major trend at the moment is gamification. Gamification will change the way organisations ...
The age of data center automation is upon us. Whether it's cloud or SDN or devops in general, automation as a means to achieve efficiency and, one hopes, free up resources that can be then redirected to focus on innovation.
As is always the case when we begin to move further upwards, abstracting ...
Windows Azure Virtual Networks offers the power to open up several cross-premises use case scenarios, including Active Directory Disaster Recovery, SQL Database Replication, Windows Server 2012 DFS-R File Replication, Accelerated Cloud File Services with BranchCache, Hybrid Web Applications and MORE...
As the infrastructure cloud market (IaaS and PaaS) continues to grow rapidly, we are seeing quite a few customers who are delivering an application – whether it is a mission-critical or SaaS application – and basing their solution on VMware.
VMware Security Cloud Encryption cloud keyboard Cloud Enc...
Have you heard of products like IBM’s InfoSphere Streams, Tibco’s Event Processing product, or Oracle’s CEP product? All good examples of commercially available stream processing technologies which help you process events in real-time.
I’ve been asked what I consider as “Big Data” versus “Small Dat...
My fellow Technical Evangelists and I have authored a content series that steps through building your very own Private Cloud by leveraging Windows Server 2012, our FREE Hyper-V Server 2012, Windows Azure Infrastructure Services ( IaaS ) and System Center 2012 Service Pack 1.
Week-by-week, we walk ...











