|By Jnan Dash||
|September 30, 2015 11:00 PM EDT||
Yesterday I attended a session in Palo Alto on the subject of Data Refinery and the speaker was Will Gorman of Pentaho. I did not realize that Pentaho was acquired by Hitachi Data Systems couple of months ago. The terms “data lake” was coined by James Dixon of Pentaho. I wrote a blog on this subject last year. As soon as the term started to appear in the data lexicon, other interesting terms such as “data swamp” appeared.
The term data lake has been coined to convey the concept of a centralized repository containing virtually inexhaustible amounts of raw (or minimally curated) data that is readily made available anytime to anyone authorized to perform analytical activities. The often unstated premise of a data lake is that it relieves users from dealing with data acquisition and maintenance issues, and guarantees fast access to local, accurate and updated data without incurring development costs (in terms of time and money) typically associated with structured data warehouses. According to IBM, “However appealing this premise, practically speaking, it is our experience, and that of our customers, that “raw” data is logistically difficult to obtain, quite challenging to interpret and describe, and tedious to maintain. Furthermore, these challenges multiply as the number of sources grows, thus increasing the need to thoroughly describe and curate the data in order to make it consumable”. I completely agree.
During the early days of Data Warehousing, the terms ETL dealt with all the data preparation stages – extract, transform, and load the curated data for query and reporting. I used to call this jokingly, “answer to 25 years of sin”. In my understanding, Pentaho’s SDR (Streamlined Data Refinery) is a modern form of ETL that deals with both internal structured data and external unstructured data including machine-generated data. In Pentaho’s own words, “The big data stakes are higher than ever before. No longer just about quantifying ‘virtual’ assets like sentiment and preference, analytics are starting to inform how we manage physical assets like inventory, machines and energy. This means companies must turn their focus to the traditional ETL processes that result in safe, clean and trustworthy data. However, for the types of ROI use cases we’re talking about today, this traditional IT process needs to be made fast, easy, highly scalable, cloud-friendly and accessible to business. And this has been a stumbling block – until now. Streamlined Data Refinery, a market-disrupting innovation that effectively brings the power of governed data delivery to “the people” unlocks big data’s full operational potential”.
Earlier I wrote about Data Curation and how new companies such as Tamr are addressing the issue. Pentaho’s SDR is another form of data curation. IBM calls it Data Wrangling process.
As usual, we love to confuse with variety of terms describing the same.
The essence of cloud computing is that all consumable IT resources are delivered as services. In his session at 15th Cloud Expo, Yung Chou, Technology Evangelist at Microsoft, demonstrated the concepts and implementations of two important cloud computing deliveries: Infrastructure as a Service (IaaS) and Platform as a Service (PaaS). He discussed from business and technical viewpoints what exactly they are, why we care, how they are different and in what ways, and the strategies for IT to transi...
Mar. 29, 2017 05:00 AM EDT Reads: 6,401
The Internet of Things is clearly many things: data collection and analytics, wearables, Smart Grids and Smart Cities, the Industrial Internet, and more. Cool platforms like Arduino, Raspberry Pi, Intel's Galileo and Edison, and a diverse world of sensors are making the IoT a great toy box for developers in all these areas. In this Power Panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists discussed what things are the most important, which will have the most profound e...
Mar. 29, 2017 04:00 AM EDT Reads: 15,076
Keeping pace with advancements in software delivery processes and tooling is taxing even for the most proficient organizations. Point tools, platforms, open source and the increasing adoption of private and public cloud services requires strong engineering rigor - all in the face of developer demands to use the tools of choice. As Agile has settled in as a mainstream practice, now DevOps has emerged as the next wave to improve software delivery speed and output. To make DevOps work, organization...
Mar. 29, 2017 03:45 AM EDT Reads: 2,129
Extreme Computing is the ability to leverage highly performant infrastructure and software to accelerate Big Data, machine learning, HPC, and Enterprise applications. High IOPS Storage, low-latency networks, in-memory databases, GPUs and other parallel accelerators are being used to achieve faster results and help businesses make better decisions. In his session at 18th Cloud Expo, Michael O'Neill, Strategic Business Development at NVIDIA, focused on some of the unique ways extreme computing is...
Mar. 29, 2017 03:30 AM EDT Reads: 11,748
My team embarked on building a data lake for our sales and marketing data to better understand customer journeys. This required building a hybrid data pipeline to connect our cloud CRM with the new Hadoop Data Lake. One challenge is that IT was not in a position to provide support until we proved value and marketing did not have the experience, so we embarked on the journey ourselves within the product marketing team for our line of business within Progress. In his session at @BigDataExpo, Sum...
Mar. 29, 2017 03:30 AM EDT Reads: 3,208
Information technology (IT) advances are transforming the way we innovate in business, thereby disrupting the old guard and their predictable status-quo. It’s creating global market turbulence. Industries are converging, and new opportunities and threats are emerging, like never before. So, how are savvy chief information officers (CIOs) leading this transition? Back in 2015, the IBM Institute for Business Value conducted a market study that included the findings from over 1,800 CIO interviews ...
Mar. 29, 2017 01:45 AM EDT Reads: 5,438
"We host and fully manage cloud data services, whether we store, the data, move the data, or run analytics on the data," stated Kamal Shannak, Senior Development Manager, Cloud Data Services, IBM, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Mar. 29, 2017 01:15 AM EDT Reads: 9,279
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
Mar. 29, 2017 01:15 AM EDT Reads: 2,485
In his General Session at 17th Cloud Expo, Bruce Swann, Senior Product Marketing Manager for Adobe Campaign, explored the key ingredients of cross-channel marketing in a digital world. Learn how the Adobe Marketing Cloud can help marketers embrace opportunities for personalized, relevant and real-time customer engagement across offline (direct mail, point of sale, call center) and digital (email, website, SMS, mobile apps, social networks, connected objects).
Mar. 28, 2017 11:15 PM EDT Reads: 3,480
SYS-CON Events announced today that Ocean9will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Ocean9 provides cloud services for Backup, Disaster Recovery (DRaaS) and instant Innovation, and redefines enterprise infrastructure with its cloud native subscription offerings for mission critical SAP workloads.
Mar. 28, 2017 08:15 PM EDT Reads: 2,356
SYS-CON Events announced today that Auditwerx will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Auditwerx specializes in SOC 1, SOC 2, and SOC 3 attestation services throughout the U.S. and Canada. As a division of Carr, Riggs & Ingram (CRI), one of the top 20 largest CPA firms nationally, you can expect the resources, skills, and experience of a much larger firm combined with the accessibility and atten...
Mar. 28, 2017 06:15 PM EDT Reads: 483
In his session at @ThingsExpo, Eric Lachapelle, CEO of the Professional Evaluation and Certification Board (PECB), will provide an overview of various initiatives to certifiy the security of connected devices and future trends in ensuring public trust of IoT. Eric Lachapelle is the Chief Executive Officer of the Professional Evaluation and Certification Board (PECB), an international certification body. His role is to help companies and individuals to achieve professional, accredited and worldw...
Mar. 28, 2017 06:00 PM EDT Reads: 869
MongoDB Atlas leverages VPC peering for AWS, a service that allows multiple VPC networks to interact. This includes VPCs that belong to other AWS account holders. By performing cross account VPC peering, users ensure networks that host and communicate their data are secure. In his session at 20th Cloud Expo, Jay Gordon, a Developer Advocate at MongoDB, will explain how to properly architect your VPC using existing AWS tools and then peer with your MongoDB Atlas cluster. He'll discuss the secur...
Mar. 28, 2017 04:45 PM EDT Reads: 527
Deep learning has been very successful in social sciences and specially areas where there is a lot of data. Trading is another field that can be viewed as social science with a lot of data. With the advent of Deep Learning and Big Data technologies for efficient computation, we are finally able to use the same methods in investment management as we would in face recognition or in making chat-bots. In his session at 20th Cloud Expo, Gaurav Chakravorty, co-founder and Head of Strategy Development ...
Mar. 28, 2017 03:45 PM EDT Reads: 3,815
[session] Offshore Development: How Not to Screw It Up | @CloudExpo @MobiDev_ #Cloud #DigitalTransformation
In his session at Cloud Expo, Alan Winters, an entertainment executive/TV producer turned serial entrepreneur, will present a success story of an entrepreneur who has both suffered through and benefited from offshore development across multiple businesses: The smart choice, or how to select the right offshore development partner Warning signs, or how to minimize chances of making the wrong choice Collaboration, or how to establish the most effective work processes Budget control, or how to m...
Mar. 28, 2017 03:45 PM EDT Reads: 447
SYS-CON Events announced today that Linux Academy, the foremost online Linux and cloud training platform and community, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Linux Academy was founded on the belief that providing high-quality, in-depth training should be available at an affordable price. Industry leaders in quality training, provided services, and student certification passes, its goal is to c...
Mar. 28, 2017 03:45 PM EDT Reads: 4,207
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
Mar. 28, 2017 03:00 PM EDT Reads: 2,159
SYS-CON Events announced today that Technologic Systems Inc., an embedded systems solutions company, will exhibit at SYS-CON's @ThingsExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Technologic Systems is an embedded systems company with headquarters in Fountain Hills, Arizona. They have been in business for 32 years, helping more than 8,000 OEM customers and building over a hundred COTS products that have never been discontinued. Technologic Systems’ pr...
Mar. 28, 2017 02:15 PM EDT Reads: 3,738
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From ...
Mar. 28, 2017 02:15 PM EDT Reads: 2,232
In his keynote at @ThingsExpo, Chris Matthieu, Director of IoT Engineering at Citrix and co-founder and CTO of Octoblu, focused on building an IoT platform and company. He provided a behind-the-scenes look at Octoblu’s platform, business, and pivots along the way (including the Citrix acquisition of Octoblu).
Mar. 28, 2017 02:00 PM EDT Reads: 14,304