|By Jason Bloomberg||
|February 28, 2013 08:00 AM EST||
Scenario #1: out of the blue, your boss calls, looking for some long-forgotten entry in a spreadsheet from 1989. Where do you look? Or consider scenario #2: said boss calls again, only this time she wants you to analyze customer purchasing behavior...going back to 1980. Similar problem, only instead of finding a single datum, you must find years of ancient information and prepare it for analysis with a modern business intelligence tool.
The answer, of course, is archiving. Fortunately, you (or your predecessor, or predecessor's predecessor) have been archiving important-or potentially important-corporate data since your organization first started using computers back in the 1960s. So all you have to do to keep your boss happy is find the appropriate archives, recover the necessary data, and you're good to go, right?
Not so fast. There are a number of gotchas to this story, some more obvious than others. Cloud to the rescue? Perhaps, but many archiving challenges remain, and the Cloud actually introduces some new speed bumps as well. Now factor in Big Data. Sure, Big Data are big, so archiving Big Data requires a big archive. Lucky you-vendors have already been knocking on your door peddling Big Data archiving solutions. Now can you finally breathe easy? Maybe, maybe not. Here's why.
Archiving: The Long View
So much of our digital lives have taken place over the last twenty years or so that we forget that digital computing dates back to the 1940s-and furthermore, we forget that this sixty-odd year lifetime of the Information Age is really only the first act of perhaps centuries of computing before humankind either evolves past zeroes and ones altogether or kills itself off in the process. Our technologies for archiving information, however, are woefully shortsighted, for several reasons:
- Hardware obsolescence (three to five years) - Using a hard drive or tape drive for archiving? It won't be long till the hardware is obsolete. You may get more life out of the gear you own, but one it wears out, you'll be stuck. Anyone who archived to laser disk in the 1980s has been down this road.
- File format obsolescence (five to ten years) - True, today's Office products can probably read that file originally saved in the Microsoft Excel version 1 file format back in the day, but what about those VisiCalc or Lotus 123 files? Tools that will convert such files to their modern equivalents will eventually grow increasingly scarce, and you always risk the possibility that they won't handle the conversion properly, leading to data corruption. If your data are encrypted, then your encryption format falls into the file format obsolescence bucket as well. And what about the programs themselves? From simple spreadsheet formulas to complex legacy spaghetti code, how do you archive algorithms in an obsolescence-proof format?
- Media obsolescence (ten to fifteen years) - CD-ROMs and digital backup tapes have an expected lifetime. Keeping them cool and dry can extend their life, but actually using them will shorten it. Do you really want to rely upon a fifteen-year-old backup tape for critical information?
- Computing paradigm obsolescence (fifty years perhaps; it's anybody's guess) - will quantum computing or biological processors or some other futuristic gear drive binary digital technologies into the Stone Age? Only time will tell. But if you are forward thinking enough to archive information for the 22nd century, there's no telling what you'll need to do to maintain the viability of your archives in a post-binary world.
Cloud to the Rescue?
On the surface, letting your Cloud Service Provider (CSP) archive your data solves many of these issues. Not only are the new archiving services like Amazon Glacier impressively cost-effective, but we can feel reasonably comfortable counting on today's CSPs to migrate our data from one hardware/media platform to the next over time as technology advances. So, can Cloud solve all your archiving issues?
At some point the answer may be yes, but Cloud Computing is still far too immature to jump to such a conclusion. Will your CSP still be in business decades from now? As the CSP market undergoes its inevitable consolidation phase, will the new CSP who bought out your old CSP handle your archive properly? Only time will tell.
But even if the CSPs rise to the archiving challenge, you may still have the file format challenge. Sure, archiving those old Lotus 123 files in the Cloud is a piece of cake, but that doesn't mean that your CSP will return them in Excel version 21.3 format ten years hence-an unfortunate and unintentional example of garbage in the Cloud.
The Big Data Old Tail
You might think that the challenges inherent in archiving Big Data are simply a matter of degree: bigger storage for bigger data sets, right? But thinking of Big Data as little more than extra-large data sets misses the big picture of the importance of Big Data.
The point to Big Data is that the indicated data sets continue to grow in size on an ongoing basis, continually pushing the limits of existing technology. The more capacity available for storage and processing, the larger the data sets we end up with. In other words, Big Data are by definition a moving target.
One familiar estimate states that the quantity of data in the world doubles every two years. Your organization's Big Data may grow somewhat faster or slower than this convenient benchmark, but in any case, the point is that Big Data growth is exponential. So, taking the two-year doubling factor as a rule of thumb, we can safely say that at any point in time, half of your Big Data are less than two years old, while the other half of your Big Data are more than two years old. And of course, this ZapFlash is concerned with the older half.
The Big Data archiving challenge, therefore, is breaking down the more-than-two-years-old Big Data sets. Remember that this two-year window is true at any point in time. Thinking about the problem mathematically, then, you can conclude that a quarter of your Big Data are more than four years old, an eighth are more than six years old, etc.
Combine this math with the lesson of the first part of this ZapFlash, and a critical point emerges: byte for byte, the cost of maintaining usable archives increases the older those archives become. And yet, the relative size of those archives is vanishingly small relative to today's and tomorrow's Big Data. Furthermore, this problem will only get worse over time, because the size of the Old Tail continues to grow exponentially.
We call this Big Data archiving problem the Big Data Old Tail. Similar to the Long Tail argument, which focuses on the value inherent in summing up the Long Tail of customer demand for niche products, the Big Data Old Tail focuses on the costs inherent in maintaining archives of increasingly small, yet increasingly costly data as we struggle to deal with older and older information. True, perhaps the fact that the Old Tail data sets from a particular time period are small will compensate for the fact that they are costly to archive, but remember that the Old Tail continues to grow over time. Unless we deal with the Old Tail, it threatens to overwhelm us.
The ZapThink Take
The obvious question that comes to mind is whether we need to save all those old data sets anyway. After all, who cares about, say, purchasing data from 1982? And of course, you may have a business reason for deleting old information. Since information you preserve may be subject to lawsuits or other unpleasantness, you may wish to delete data once it's legal to do so.
Fair enough. But there are perhaps far more examples of Big Data sets that your organization will wish to preserve indefinitely than data sets you're happy to delete. From scientific data to information on market behavior to social trends, the richness of our Big Data do not simply depend on the information from the last year or two or even ten. After all, if we forget the mistakes of the past then we are doomed to repeat them. Crunching today's Big Data can give us business intelligence, but only by crunching yesterday's Big Data as well can we ever expect to glean wisdom from our information.
One of the hottest areas in cloud right now is DRaaS and related offerings. In his session at 16th Cloud Expo, Dale Levesque, Disaster Recovery Product Manager with Windstream's Cloud and Data Center Marketing team, will discuss the benefits of the cloud model, which far outweigh the traditional approach, and how enterprises need to ensure that their needs are properly being met.
Jul. 1, 2015 12:15 PM EDT Reads: 2,008
To many people, IoT is a buzzword whose value is not understood. Many people think IoT is all about wearables and home automation. In his session at @ThingsExpo, Mike Kavis, Vice President & Principal Cloud Architect at Cloud Technology Partners, discussed some incredible game-changing use cases and how they are transforming industries like agriculture, manufacturing, health care, and smart cities. He will discuss cool technologies like smart dust, robotics, smart labels, and much more. Prepare...
Jul. 1, 2015 11:30 AM EDT Reads: 694
It is one thing to build single industrial IoT applications, but what will it take to build the Smart Cities and truly society-changing applications of the future? The technology won’t be the problem, it will be the number of parties that need to work together and be aligned in their motivation to succeed. In his session at @ThingsExpo, Jason Mondanaro, Director, Product Management at Metanga, discussed how you can plan to cooperate, partner, and form lasting all-star teams to change the world...
Jul. 1, 2015 11:30 AM EDT Reads: 2,213
In his session at 16th Cloud Expo, Simone Brunozzi, VP and Chief Technologist of Cloud Services at VMware, reviewed the changes that the cloud computing industry has gone through over the last five years and shared insights into what the next five will bring. He also chronicled the challenges enterprise companies are facing as they move to the public cloud. He delved into the "Hybrid Cloud" space and explained why every CIO should consider ‘hybrid cloud' as part of their future strategy to achi...
Jul. 1, 2015 11:10 AM EDT Reads: 390
Internet of Things is moving from being a hype to a reality. Experts estimate that internet connected cars will grow to 152 million, while over 100 million internet connected wireless light bulbs and lamps will be operational by 2020. These and many other intriguing statistics highlight the importance of Internet powered devices and how market penetration is going to multiply many times over in the next few years.
Jul. 1, 2015 10:30 AM EDT Reads: 2,097
Malicious agents are moving faster than the speed of business. Even more worrisome, most companies are relying on legacy approaches to security that are no longer capable of meeting current threats. In the modern cloud, threat diversity is rapidly expanding, necessitating more sophisticated security protocols than those used in the past or in desktop environments. Yet companies are falling for cloud security myths that were truths at one time but have evolved out of existence.
Jul. 1, 2015 09:15 AM EDT Reads: 2,123
Today air travel is a minefield of delays, hassles and customer disappointment. Airlines struggle to revitalize the experience. GE and M2Mi will demonstrate practical examples of how IoT solutions are helping airlines bring back personalization, reduce trip time and improve reliability. In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect with GE, and Dr. Sarah Cooper, M2Mi’s VP Business Development and Engineering, will explore the IoT cloud-based platform technologies drivi...
Jul. 1, 2015 08:00 AM EDT Reads: 830
In the midst of the widespread popularity and adoption of cloud computing, it seems like everything is being offered “as a Service” these days: Infrastructure? Check. Platform? You bet. Software? Absolutely. Toaster? It’s only a matter of time. With service providers positioning vastly differing offerings under a generic “cloud” umbrella, it’s all too easy to get confused about what’s actually being offered. In his session at 16th Cloud Expo, Kevin Hazard, Director of Digital Content for SoftL...
Jun. 30, 2015 05:00 PM EDT Reads: 2,111
Discussions about cloud computing are evolving into discussions about enterprise IT in general. As enterprises increasingly migrate toward their own unique clouds, new issues such as the use of containers and microservices emerge to keep things interesting. In this Power Panel at 16th Cloud Expo, moderated by Conference Chair Roger Strukhoff, panelists addressed the state of cloud computing today, and what enterprise IT professionals need to know about how the latest topics and trends affect t...
Jun. 30, 2015 08:30 AM EDT Reads: 1,106
Public Cloud IaaS started its life in the developer and startup communities and has grown rapidly to a $20B+ industry, but it still pales in comparison to how much is spent worldwide on IT: $3.6 trillion. In fact, there are 8.6 million data centers worldwide, the reality is many small and medium sized business have server closets and colocation footprints filled with servers and storage gear. While on-premise environment virtualization may have peaked at 75%, the Public Cloud has lagged in adop...
Jun. 29, 2015 03:00 PM EDT Reads: 2,339
SYS-CON Events announced today that BMC will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. BMC delivers software solutions that help IT transform digital enterprises for the ultimate competitive business advantage. BMC has worked with thousands of leading companies to create and deliver powerful IT management services. From mainframe to cloud to mobile, BMC pairs high-speed digital innovation with robust...
Jun. 29, 2015 12:15 PM EDT Reads: 2,679
Even as cloud and managed services grow increasingly central to business strategy and performance, challenges remain. The biggest sticking point for companies seeking to capitalize on the cloud is data security. Keeping data safe is an issue in any computing environment, and it has been a focus since the earliest days of the cloud revolution. Understandably so: a lot can go wrong when you allow valuable information to live outside the firewall. Recent revelations about government snooping, along...
Jun. 29, 2015 12:00 PM EDT Reads: 2,190
The Internet of Things is not only adding billions of sensors and billions of terabytes to the Internet. It is also forcing a fundamental change in the way we envision Information Technology. For the first time, more data is being created by devices at the edge of the Internet rather than from centralized systems. What does this mean for today's IT professional? In this Power Panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists will addresses this very serious issue o...
Jun. 29, 2015 09:45 AM EDT Reads: 2,524
The web app is Agile. The REST API is Agile. The testing and planning are Agile. But alas, Data infrastructures certainly are not. Once an application matures, changing the shape or indexing scheme of data often forces at best a top down planning exercise and at worst includes schema changes which force downtime. The time has come for a new approach that fundamentally advances the agility of distributed data infrastructures. Come learn about a new solution to the problems faced by software orga...
Jun. 28, 2015 04:00 PM EDT Reads: 2,112
Business as usual for IT is evolving into a "Make or Buy" decision on a service-by-service conversation with input from the LOBs. How does your organization move forward with cloud? In his general session at 16th Cloud Expo, Paul Maravei, Regional Sales Manager, Hybrid Cloud and Managed Services at Cisco, discusses how Cisco and its partners offer a market-leading portfolio and ecosystem of cloud infrastructure and application services that allow you to uniquely and securely combine cloud busin...
Jun. 28, 2015 11:00 AM EDT Reads: 2,226
In their general session at 16th Cloud Expo, Michael Piccininni, Global Account Manager - Cloud SP at EMC Corporation, and Mike Dietze, Regional Director at Windstream Hosted Solutions, reviewed next generation cloud services, including the Windstream-EMC Tier Storage solutions, and discussed how to increase efficiencies, improve service delivery and enhance corporate cloud solution development. Michael Piccininni is Global Account Manager – Cloud SP at EMC Corporation. He has been engaged in t...
Jun. 27, 2015 04:00 PM EDT Reads: 2,177
Announcing @SoftLayer "Gold Sponsor" of @CloudExpo Silicon Valley | #IoT #API #DevOps #Microservices
SYS-CON Events announced today that SoftLayer, an IBM company, has been named “Gold Sponsor” of SYS-CON's 17th International Cloud Expo®, which will take place November 3–5, 2015 at the Santa Clara Convention Center in Santa Clara, CA. SoftLayer operates a global cloud infrastructure platform built for Internet scale. With a global footprint of data centers and network points of presence, SoftLayer provides infrastructure as a service to leading-edge customers ranging from Web startups to globa...
Jun. 26, 2015 05:00 PM EDT Reads: 2,020
Announcing @Kintone_Global "Bronze Sponsor" of @CloudExpo Silicon Valley | #IoT #API #DevOps #Microservices
SYS-CON Events announced today that kintone has been named “Bronze Sponsor” of SYS-CON's 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. kintone promotes cloud-based workgroup productivity, transparency and profitability with a seamless collaboration space, build your own business application (BYOA) platform, and workflow automation system.
Jun. 26, 2015 04:00 PM EDT Reads: 1,914
SYS-CON Events announced today that Alert Logic, the leading provider of Security-as-a-Service solutions for the cloud, has been named “Bronze Sponsor” of SYS-CON's 17th International Cloud Expo® and DevOps Summit 2015 Silicon Valley, which will take place November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Alert Logic provides Security-as-a-Service for on-premises, cloud, and hybrid IT infrastructures, delivering deep security insight and continuous protection for cust...
Jun. 26, 2015 04:00 PM EDT Reads: 1,971
Announcing @Solgenia_Corp to Exhibit at @CloudExpo Silicon Valley | #IoT #API #DevOps #Microservices
SYS-CON Events announced today that Solgenia will exhibit at SYS-CON's 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Solgenia is the global market leader in Cloud Collaboration and Cloud Infrastructure software solutions. Designed to “Bridge the Gap” between Personal and Professional Social, Mobile and Cloud user experiences, our solutions help large and medium-sized organizations dramatically improve produc...
Jun. 26, 2015 03:00 PM EDT Reads: 1,968