Welcome!

@BigDataExpo Authors: Elizabeth White, Liz McMillan, Yeshim Deniz, Jim Hansen, Pat Romanski

Related Topics: @BigDataExpo, Java IoT, Industrial IoT, Microsoft Cloud, Machine Learning , @CloudExpo

@BigDataExpo: Article

Five Big Data Features in SQL Server

Traditional RDBMS and Big Data

Traditional RDBMS & New Data Processing
Over the past two decades relational databases have been most successful in serving large scale OLTP and OLAP applications across enterprises. However, in the past couple of years with the advent of Big Data processing, especially for processing unstructured data coupled with the need for processing massive quantities of data, made the industry to look into Non RDBMS solutions. This has lead into the popularity of NOSQL databases as well as massively parallel processing frameworks.

However the traditional RDBMS were quick to react and added several Big Data features as part of their offering so that the enterprises with a heavy investment of traditional RDBMS can have the best of both worlds by properly leveraging these new features.

The following sections provide ideas about Big Data features in the popular SQL Server databases; a similar analysis will be performed against Oracle also in a later article.

1. Column Store Indexes
Column oriented databases differs from RDBMS like SQL Server such that they store data tables as sections of columns of data rather than rows of data. This has proven to be advantageous in certain situations where aggregates are computed over large number of similar data items.

Columnstore index, a feature in SQL Server 2012, groups and stores data for each column and then joins all the columns to complete the whole index. This differs from traditional indexes that group and store data for each row and then join all the rows to complete the whole index. For some types of queries, the SQL Server query processor can take advantage of the columnstore layout to significantly improve query execution times.

2. Hadoop Connectors
The SQL Server-Hadoop Connector is a Sqoop-based connector that facilitates efficient data transfer between SQL Server and Hadoop. Sqoop supports several databases including HDFS. This connector is bidirectional. You can import as well as export the data.

With SQL Server-Hadoop Connector, you can export data from:

  • delimited text files on HDFS to SQL Server
  • sequenceFiles on HDFS to SQL Server
  • hive Tables to tables in SQL Server

3. Full Text Search
There is no question about Sql Server's ability to process relational data and perform queries using JOINs and other traditional means. However much of the Big Data processing needs of the enterprise goes towards processing unstructured data which is generally in the form of textual data.

Full-Text Search in SQL Server lets users and applications run full-text queries against character-based data in SQL Server tables. Before you can run full-text queries on a table, the database administrator must create a full-text index on the table. The full-text index includes one or more character-based columns in the table.

After columns have been added to a full-text index, users and applications can run full-text queries on the text in the columns. These queries can search for any of the following:

  • One or more specific words or phrases (simple term)
  • A word or a phrase where the words begin with specified text (prefix term)
  • Inflectional forms of a specific word (generation term)
  • A word or phrase close to another word or phrase (proximity term)
  • Synonymous forms of a specific word (thesaurus)
  • Words or phrases using weighted values (weighted term)

4. Windows Azure SQL Federation/ Distributed Partition Views
Though not directly related to SQL Server, Windows Azure SQL databases are just the cloud version of the SQL server and this feature provides massively parallel processing capabilities on the database work load on cloud.

SQL Database with federation is a way to achieve greater scalability and performance from the database tier of your application through horizontal partition. One or more tables within a database with federation are split by row and portioned across multiple databases known as federation members. A federation is defined by a federation distribution scheme, or federation scheme. The federation scheme defines a federation distribution key, which determines the distribution of data to partitions within the federation.

The equivalent feature on the SQL Server database is known as DPV (Distributed Partition View). The primary SQL Server feature that allows transparent scale-out is the distributed partitioned view (DPV), sometimes referred to as federated view. In some cases, DPVs are used to help manage very large databases (VLDBs). Instead of creating and maintaining a multi-terabyte database, several smaller databases within the same instance are created.

5. Map Reduce Integration / Polybase
PolyBase is a breakthrough new technology on the data processing engine in SQL Server 2012 Parallel Data Warehouse designed as the simplest way to combine non-relational data and traditional relational data in your analysis.

Polybase unifies the relational and non-relational worlds at the query level. It provides the following Big Data processing features:

  • Integrated Query: Accepts a standard T-SQL query that joins tables containing a relational source with tables in a Hadoop cluster without needing to learn MapReduce.
  • Advanced query options: Apart from simple SELECT queries, users can perform JOINs and GROUP BYs on data in the Hadoop cluster.
  • Polybase is an exciting new technology and requires a separate coverage, but the above information just provides an introduction.

Summary
Traditional high performance RDBMS like SQL Server have their strengths. They are very strong in maintaining the data integrity and quality in the form of constraints, foreign keys and other validation mechanisms. They are also strong in transactional integrity by providing superior locking model, automatic dead lock resolution, etc. However initially they are not found to adjust to Big Data processing needs of enterprises.

With the enhancements in the products made by respective vendors , now databases like Sql Server have been enhanced with big data processing features and makes them the best candidate for enterprises looking for best of the breed features between traditional RDBMS and Big Data processing systems, and to leverage the best of existing investments.

More Stories By Srinivasan Sundara Rajan

Highly passionate about utilizing Digital Technologies to enable next generation enterprise. Believes in enterprise transformation through the Natives (Cloud Native & Mobile Native).

@BigDataExpo Stories
We all know that data growth is exploding and storage budgets are shrinking. Instead of showing you charts on about how much data there is, in his General Session at 17th Cloud Expo, Scott Cleland, Senior Director of Product Marketing at HGST, showed how to capture all of your data in one place. After you have your data under control, you can then analyze it in one place, saving time and resources.
A look across the tech landscape at the disruptive technologies that are increasing in prominence and speculate as to which will be most impactful for communications – namely, AI and Cloud Computing. In his session at 20th Cloud Expo, Curtis Peterson, VP of Operations at RingCentral, will highlight the current challenges of these transformative technologies and share strategies for preparing your organization for these changes. This “view from the top” will outline the latest trends and developm...
"A lot of times people will come to us and have a very diverse set of requirements or very customized need and we'll help them to implement it in a fashion that you can't just buy off of the shelf," explained Nick Rose, CTO of Enzu, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
SYS-CON Events announced today that delaPlex will exhibit at SYS-CON's @CloudExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. delaPlex pioneered Software Development as a Service (SDaaS), which provides scalable resources to build, test, and deploy software. It’s a fast and more reliable way to develop a new product or expand your in-house team.
Extreme Computing is the ability to leverage highly performant infrastructure and software to accelerate Big Data, machine learning, HPC, and Enterprise applications. High IOPS Storage, low-latency networks, in-memory databases, GPUs and other parallel accelerators are being used to achieve faster results and help businesses make better decisions. In his session at 18th Cloud Expo, Michael O'Neill, Strategic Business Development at NVIDIA, focused on some of the unique ways extreme computing is...
The explosion of new web/cloud/IoT-based applications and the data they generate are transforming our world right before our eyes. In this rush to adopt these new technologies, organizations are often ignoring fundamental questions concerning who owns the data and failing to ask for permission to conduct invasive surveillance of their customers. Organizations that are not transparent about how their systems gather data telemetry without offering shared data ownership risk product rejection, regu...
SYS-CON Events announced today that CA Technologies has been named "Platinum Sponsor" of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, New York, and 21st International Cloud Expo, which will take place in November in Silicon Valley, California.
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place June 6-8, 2017, at the Javits Center in New York City, New York, is co-located with 20th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry p...
"Peak 10 is a national cloud data center solutions managed services provider, and part of that is disaster recovery. We see a growing trend in the industry where companies are coming to us looking for assistance in their DR strategy," stated Andrew Cole, Director of Solutions Engineering at Peak 10, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
In his keynote at @ThingsExpo, Chris Matthieu, Director of IoT Engineering at Citrix and co-founder and CTO of Octoblu, focused on building an IoT platform and company. He provided a behind-the-scenes look at Octoblu’s platform, business, and pivots along the way (including the Citrix acquisition of Octoblu).
SYS-CON Events announced today that Technologic Systems Inc., an embedded systems solutions company, will exhibit at SYS-CON's @ThingsExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Technologic Systems is an embedded systems company with headquarters in Fountain Hills, Arizona. They have been in business for 32 years, helping more than 8,000 OEM customers and building over a hundred COTS products that have never been discontinued. Technologic Systems’ pr...
How will your company move to the cloud while ensuring a solid security posture? Organizations from small to large are increasingly adopting cloud solutions to deliver essential business services at a much lower cost. According to cyber security experts, the frequency and severity of cyber-attacks are on the rise, causing alarm to businesses and customers across a variety of industries. To defend against exploits like these, a company must adopt a comprehensive security defense strategy that is ...
In his keynote at @ThingsExpo, Chris Matthieu, Director of IoT Engineering at Citrix and co-founder and CTO of Octoblu, focused on building an IoT platform and company. He provided a behind-the-scenes look at Octoblu’s platform, business, and pivots along the way (including the Citrix acquisition of Octoblu).
In his session at 20th Cloud Expo, Chris Carter, CEO of Approyo, will discuss the basic set up and solution for an SAP solution in the cloud and what it means to the viability of your company. Chris Carter is CEO of Approyo. He works with business around the globe, to assist them in their journey to the usage of Big Data in the forms of Hadoop (Cloudera and Hortonwork's) and SAP HANA. At Approyo, we support firms who are looking for knowledge to grow through current business process, where even...
DevOps is being widely accepted (if not fully adopted) as essential in enterprise IT. But as Enterprise DevOps gains maturity, expands scope, and increases velocity, the need for data-driven decisions across teams becomes more acute. DevOps teams in any modern business must wrangle the ‘digital exhaust’ from the delivery toolchain, "pervasive" and "cognitive" computing, APIs and services, mobile devices and applications, the Internet of Things, and now even blockchain.
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From ...
Cloud Expo, Inc. has announced today that Andi Mann and Aruna Ravichandran have been named Co-Chairs of @DevOpsSummit at Cloud Expo 2017. The @DevOpsSummit at Cloud Expo New York will take place on June 6-8, 2017, at the Javits Center in New York City, New York, and @DevOpsSummit at Cloud Expo Silicon Valley will take place Oct. 31-Nov. 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, Cloud Expo and @ThingsExpo are two of the most important technology events of the year. Since its launch over eight years ago, Cloud Expo and @ThingsExpo have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, I provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading the...
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering Cloud Expo and @ThingsExpo will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at Cloud Expo. Product announcements during our show provide your company with the most reach through our targeted audiences.