Welcome!

Big Data Journal Authors: Scott Bampton, Carmen Gonzalez, Liz McMillan, Roger Strukhoff, Pat Romanski

Related Topics: Big Data Journal, Java, XML, SOA & WOA, .NET, Cloud Expo

Big Data Journal: Article

Simplified Data Retention on a Massive Scale Speeds Access to Big Data

Organizations can gain competitive advantages when able to rely on data retention for improved decision making & trend analysis

There are numerous applications for cost-effective data retention. Organizations can gain substantial competitive advantages when able to rely on data retention for improved decision making and trend analysis. Research enterprises can make use of large scale data sets enabling them to study information more completely than ever before.

Simplified data retention on a massive scale speeds up access time to Big Data. Big Data is defined as large-scale data sets that are too large to analyze and manage using ordinary methods. This data in both structured and unstructured form is valuable and comes from sources such as trading systems.

In many cases existing systems cannot process data of this variety and volume. Some organizations store such data in file systems so as not to overburden their databases. This may be a temporary stop gap, but it will not suffice in the long run. Because Big Data is increasing at an exponential rate, this is only a temporary solution. It's likely that machine-generated data will exceed the processing capability of conventional systems. The cost of extracting this data can be so high that many organizations will just shy away from it.

Today technology is just beginning to address Big Data issues. Many organizations try to apply existing strategies to manage this data effectively. Standard methods from relational database queries to complex analysis tools are being used. Data retention software is also being applied to extract relevant information from Big Data sources.

Currently Big Data retention technology is available that is scalable and easy to implement. Using this technology it's possible to access Big Data online using SQL along with business intelligence software. Components of this type of system are storage platforms with specialized software and a specialized massive scale data repository developed for data retention online. This unique Big Data management system is scalable and designed to process machine-generated data at 40:11 compression ratios while maintaining its online availability.

Organizations that need to process Big Data may benefit by using databases specifically designed for this purpose. Such databases will prove cost-effective and are currently being used in numerous organizations internationally. Such databases work in parallel allowing tens of billions of records to be processed each day. At the same time, the retention capability is practically limitless. This database can fit content addressable storage (CAS), direct attach storage (DAS), and storage area network (SAN). Some of the benefits of this data storage and retrieval system are reduction in infrastructure through reduction in physical storage demand and effective, configurable record management.

One Big Data retention solution has three components. The first is paired server level service managers that share metadata and provide import and query capability. The second is a data archive residing on a cluster services node as well as storage nodes. It's designed with enough scalability to process billions of objects. The third component consists of shared storage that can be local direct access storage, a network file system or a comprehensive clustered file system.

This type of system was recently tested on 508 GB of artificially generated using stock trading test data, modeled after NASDAQ. Performance test results for data import showed a rate of close to 12 billion records imported within an hour. Data compression resulted in a data reduction of 476.1 GB. The archive data was only about 6.3% of the original size prior to compression. A SQL query was executed selecting the three largest volume stocks having trades of well over 4 million per day. This query against 11.6 billion records took approximately 5.5 seconds to execute.

Big Data is high-volume, high-velocity and perhaps highly variable as well. Big Data retention solutions can lead to better decision making, new discoveries and even process optimization. Science is a major area that can benefit from Big Data solutions. Meteorology is just one example that can reap rewards by using new technological advances in data retention on a massive scale. The ability to do research and analysis with extremely large sets of data gives greater understanding to those who are modeling weather, oceanographic conditions, the economy or social trends. With new cost-effective technology available many new organizations will consider the possibilities of Big Data retention in their enterprise.

More Stories By Alan McMahon

Alan McMahon works for Dell. He has worked for Dell for the past 13 years and is involved in enterprise solution design across a range of products from servers and storage to virtualization. He now focuses his attention on marketing for Dell. He is based in Ireland and enjoys sailing as a past time.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@BigDataExpo Stories
SYS-CON Events announced today that O'Reilly Media has been named “Media Sponsor” of SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. O'Reilly Media spreads the knowledge of innovators through its books, online services, magazines, and conferences. Since 1978, O'Reilly Media has been a chronicler and catalyst of cutting-edge development, homing in on the technology trends that really matter and spurri...
Samsung promises to be one of the 800-pound gorillas of the IoT, if its success in recent years with Android devices and other consumer electronics is any guide. Showing its willingness to be a big IoT player, the company recently acquired SmartThings, a recent startup that's developed an open smarthome appliation that currently supports 1,000 devices and 8,000 apps. SmartThings will now work under the auspices of Samsung's Open Innovation Center (OIC). SmartThings Founder and CEO Alex Hawkinson...
The Internet of Things (IoT) is going to require a new way of thinking and of developing software for speed, security and innovation. This requires IT leaders to balance business as usual while anticipating for the next market and technology trends. Cloud provides the right IT asset portfolio to help today’s IT leaders manage the old and prepare for the new. Today the cloud conversation is evolving from private and public to hybrid. This session will provide use cases and insights to reinforce t...
What process has your provider undertaken to ensure that the cloud tenant will receive predictable performance and service? What was involved in the planning? Who owns and operates the data center? What technology is being used? How is it being supported? In his session at 14th Cloud Expo, Dave Weisbrot, Cloud Business Manager for QTS, will provide the attendees a look into what it takes to stand up and stand behind a highly available certified cloud IaaS.
I'll be hosting an SAP HANA Cloud webinar at 11am eastern time, Wednesday, October 29. You can sign up now. Featured speakers will be Allan Adler, Managing Partner, Channel Cloud Consulting, and Thorsten Leiduck, VP ISVs & Digital Commerce, SAP. Attendees will learn about • Cloud economics, hybrid cloud strategy, market size and opportunity • Introduction to SAP HANA Cloud Platform and how to: - Build new next-generation applications - Extend on-premise solutions non-disruptively throu...
SYS-CON Events announced today that Gigaom Research has been named "Media Sponsor" of SYS-CON's 15th International Cloud Expo®, which will take place on November 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Ashar Baig, Research Director, Cloud, at Gigaom Research, will also lead a Power Panel on the topic "Choosing the Right Cloud Option." Gigaom Research provides timely, in-depth analysis of emerging technologies for individual and corporate subscribers. Gigaom Research'...
Join both SAP and Channel Cloud Consulting for our webcast and uncover how you can extend your reach to capture a piece of the US$17 billion cloud application services market with SAP. Learn about SAPs market-leading SAP HANA Cloud Platform and an exciting opportunity to join SAPs growing ecosystem of Application Development partners. When: October 29, 11:00am EST Speakers: Allan Adler, Managing Partner, Channel Cloud Consulting Thorsten Leiduck, Vice President ISVs & Digital Commerce, SAP
Application Performance Management (APM) has been bred with all the right elements to give us the insights we need to see the health of our applications. Similar to your most trusted watch dog, it alerts us to anomalies when events occur, providing awareness to the environment that only they can observe. As enterprises embrace the DevOps philosophy, and the coalescence of the Development and Operations continues, I foresee the conditions ripening to foster innovative methods of making applicati...
SYS-CON Events announced today that IBM is holding a Bluemix Developer Playground on November 5, 10:30 am to 5:30 pm at 15th Cloud Expo. 15th Cloud Expo, co-located with @ThingsExpo, Big Data Expo, and DevOps Summit is taking place Nov. 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. The labs, for developers of all levels, will highlight the ease of use of Bluemix, its services and functionality and provide short-term introductory projects that developers can complete betw...
The Industrial Internet revolution is now underway, enabled by connected machines and billions of devices that communicate and collaborate. The massive amounts of Big Data requiring real-time analysis is flooding legacy IT systems and giving way to cloud environments that can handle the unpredictable workloads. Yet many barriers remain until we can fully realize the opportunities and benefits from the convergence of machines and devices with Big Data and the cloud, including interoperability, da...
Software AG helps organizations transform into Digital Enterprises, so they can differentiate from competitors and better engage customers, partners and employees. Using the Software AG Suite, companies can close the gap between business and IT to create digital systems of differentiation that drive front-line agility. We offer four on-ramps to the Digital Enterprise: alignment through collaborative process analysis; transformation through portfolio management; agility through process automation...
How do you know when a technology has become mainstream? A good clue may be when politicians start talking about it on the campaign trail and with mainstream media. David Cameron, the UK prime minister, was the latest, indicating that the world was now on “fast-forward” with the Internet of Things (IoT) ushering in the new industrial revolution. No mention of IoT targeted at the masses would be complete without the clichéd example of the communicating fridge. While it is easy to get caught up in...
In my recent article, “Software Quality Metrics for your Continuous Delivery Pipeline – Part III – Logging,” I wrote about the good parts and the not-so-good parts of logging and concluded that logging usually fails to deliver what it is so often mistakenly used for: as a mechanism for analyzing application failures in production. In response to the heated debates on reddit.com/r/devops and reddit.com/r/programing, I want to demonstrate the wealth of out-of-the-box insights you could obtain from...
The Internet of Things will greatly expand the opportunities for data collection and new business models driven off of that data. In her session at Internet of @ThingsExpo, Esmeralda Swartz, CMO of MetraTech, will discuss how for this to be effective you not only need to have infrastructure and operational models capable of utilizing this new phenomenon, but increasingly service providers will need to convince a skeptical public to participate. Get ready to show them the money! Speaker Bio: ...
This year like last year, XebiaLabs polled Fortune 1000 companies in banking, manufacturing, healthcare, government and IT, interviewing DevOps teams and everyone from QA to C-level suites. More than 1,000 people were asked to share their perspectives on software delivery trends. Last year the survey found that application deployments fail up to 30% of the time and that 75% of managers believe their deployment process deserves a failing grade. This year, the survey revealed little change in at...
Can a postmortem review help foster a curiosity for innovative possibilities to make application performance better? Blue-sky thinkers may not want to deal with the myriad of details on how to manage the events being generated operationally, but could learn something from this exercise. Consider the major system failures in your organization over the last 12 to 18 months. What if you had a system or process in place to capture those failures and mitigate them from a proactive standpoint prevent...
Machine-to-machine (M2M) technology and the resulting Internet of Things are leading us inexorably toward everything-as-a-service (XaaS). As more things get connected, the range of service opportunities expands. And as those services are presented online, they become available for use, re-use and re-purposing. At first thought, the idea of more connected devices suggests simply that there will be more devices around, and as such, more products for manufacturers to make and sell. That’s true, bu...
General Electric (GE) has been a household name for more than a century, thanks in large part to its role in making households easier to run. Starting with the light bulb invented by its founder, Thomas Edison, GE has been selling devices (“things”) to consumers throughout its 122-year history. Last week, GE announced that it is officially leaving that job to others. While the lighting division will stay, GE will now turn its attention to selling industrial machinery and analytics as a service t...
If you work in technology, you’d have to have been under a rock to have not heard about Docker. In a nutshell, Docker provides a lightweight container for code that can be installed onto a Linux system, providing both an execution environment for applications and partitioning to securely segregate sets of application code from one another. While this high-level description doesn’t sound that exciting, Docker addresses three key issues confronting application developers: One of the problems conf...
In her General Session at 15th Cloud Expo, Anne Plese, Senior Consultant, Cloud Product Marketing, at Verizon Enterprise, will focus on finding the right mix of renting vs. buying Oracle capacity to scale to meet business demands, and offer validated Oracle database TCO models for Oracle development and testing environments. Anne Plese is a marketing and technology enthusiast/realist with over 19+ years in high tech. At Verizon Enterprise, she focuses on driving growth for the Verizon Cloud pla...