Welcome!

@BigDataExpo Authors: Kevin Benedict, Jnan Dash, Elizabeth White, Yeshim Deniz, Pat Romanski

Related Topics: @BigDataExpo, Java IoT, Industrial IoT, Microsoft Cloud, IoT User Interface, @CloudExpo

@BigDataExpo: Article

Five Big Data Features in SQL Server

Traditional RDBMS and Big Data

Traditional RDBMS & New Data Processing
Over the past two decades relational databases have been most successful in serving large scale OLTP and OLAP applications across enterprises. However, in the past couple of years with the advent of Big Data processing, especially for processing unstructured data coupled with the need for processing massive quantities of data, made the industry to look into Non RDBMS solutions. This has lead into the popularity of NOSQL databases as well as massively parallel processing frameworks.

However the traditional RDBMS were quick to react and added several Big Data features as part of their offering so that the enterprises with a heavy investment of traditional RDBMS can have the best of both worlds by properly leveraging these new features.

The following sections provide ideas about Big Data features in the popular SQL Server databases; a similar analysis will be performed against Oracle also in a later article.

1. Column Store Indexes
Column oriented databases differs from RDBMS like SQL Server such that they store data tables as sections of columns of data rather than rows of data. This has proven to be advantageous in certain situations where aggregates are computed over large number of similar data items.

Columnstore index, a feature in SQL Server 2012, groups and stores data for each column and then joins all the columns to complete the whole index. This differs from traditional indexes that group and store data for each row and then join all the rows to complete the whole index. For some types of queries, the SQL Server query processor can take advantage of the columnstore layout to significantly improve query execution times.

2. Hadoop Connectors
The SQL Server-Hadoop Connector is a Sqoop-based connector that facilitates efficient data transfer between SQL Server and Hadoop. Sqoop supports several databases including HDFS. This connector is bidirectional. You can import as well as export the data.

With SQL Server-Hadoop Connector, you can export data from:

  • delimited text files on HDFS to SQL Server
  • sequenceFiles on HDFS to SQL Server
  • hive Tables to tables in SQL Server

3. Full Text Search
There is no question about Sql Server's ability to process relational data and perform queries using JOINs and other traditional means. However much of the Big Data processing needs of the enterprise goes towards processing unstructured data which is generally in the form of textual data.

Full-Text Search in SQL Server lets users and applications run full-text queries against character-based data in SQL Server tables. Before you can run full-text queries on a table, the database administrator must create a full-text index on the table. The full-text index includes one or more character-based columns in the table.

After columns have been added to a full-text index, users and applications can run full-text queries on the text in the columns. These queries can search for any of the following:

  • One or more specific words or phrases (simple term)
  • A word or a phrase where the words begin with specified text (prefix term)
  • Inflectional forms of a specific word (generation term)
  • A word or phrase close to another word or phrase (proximity term)
  • Synonymous forms of a specific word (thesaurus)
  • Words or phrases using weighted values (weighted term)

4. Windows Azure SQL Federation/ Distributed Partition Views
Though not directly related to SQL Server, Windows Azure SQL databases are just the cloud version of the SQL server and this feature provides massively parallel processing capabilities on the database work load on cloud.

SQL Database with federation is a way to achieve greater scalability and performance from the database tier of your application through horizontal partition. One or more tables within a database with federation are split by row and portioned across multiple databases known as federation members. A federation is defined by a federation distribution scheme, or federation scheme. The federation scheme defines a federation distribution key, which determines the distribution of data to partitions within the federation.

The equivalent feature on the SQL Server database is known as DPV (Distributed Partition View). The primary SQL Server feature that allows transparent scale-out is the distributed partitioned view (DPV), sometimes referred to as federated view. In some cases, DPVs are used to help manage very large databases (VLDBs). Instead of creating and maintaining a multi-terabyte database, several smaller databases within the same instance are created.

5. Map Reduce Integration / Polybase
PolyBase is a breakthrough new technology on the data processing engine in SQL Server 2012 Parallel Data Warehouse designed as the simplest way to combine non-relational data and traditional relational data in your analysis.

Polybase unifies the relational and non-relational worlds at the query level. It provides the following Big Data processing features:

  • Integrated Query: Accepts a standard T-SQL query that joins tables containing a relational source with tables in a Hadoop cluster without needing to learn MapReduce.
  • Advanced query options: Apart from simple SELECT queries, users can perform JOINs and GROUP BYs on data in the Hadoop cluster.
  • Polybase is an exciting new technology and requires a separate coverage, but the above information just provides an introduction.

Summary
Traditional high performance RDBMS like SQL Server have their strengths. They are very strong in maintaining the data integrity and quality in the form of constraints, foreign keys and other validation mechanisms. They are also strong in transactional integrity by providing superior locking model, automatic dead lock resolution, etc. However initially they are not found to adjust to Big Data processing needs of enterprises.

With the enhancements in the products made by respective vendors , now databases like Sql Server have been enhanced with big data processing features and makes them the best candidate for enterprises looking for best of the breed features between traditional RDBMS and Big Data processing systems, and to leverage the best of existing investments.

More Stories By Srinivasan Sundara Rajan

Highly passionate about utilizing Digital Technologies to enable next generation enterprise. Believes in enterprise transformation through the Natives (Cloud Native & Mobile Native).

@BigDataExpo Stories
In this strange new world where more and more power is drawn from business technology, companies are effectively straddling two paths on the road to innovation and transformation into digital enterprises. The first path is the heritage trail – with “legacy” technology forming the background. Here, extant technologies are transformed by core IT teams to provide more API-driven approaches. Legacy systems can restrict companies that are transitioning into digital enterprises. To truly become a lea...
Internet of @ThingsExpo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 19th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devices - comp...
Why do your mobile transformations need to happen today? Mobile is the strategy that enterprise transformation centers on to drive customer engagement. In his general session at @ThingsExpo, Roger Woods, Director, Mobile Product & Strategy – Adobe Marketing Cloud, covered key IoT and mobile trends that are forcing mobile transformation, key components of a solid mobile strategy and explored how brands are effectively driving mobile change throughout the enterprise.
What are the new priorities for the connected business? First: businesses need to think differently about the types of connections they will need to make – these span well beyond the traditional app to app into more modern forms of integration including SaaS integrations, mobile integrations, APIs, device integration and Big Data integration. It’s important these are unified together vs. doing them all piecemeal. Second, these types of connections need to be simple to design, adapt and configure...
SYS-CON Events announced today the Enterprise IoT Bootcamp, being held November 1-2, 2016, in conjunction with 19th Cloud Expo | @ThingsExpo at the Santa Clara Convention Center in Santa Clara, CA. Combined with real-world scenarios and use cases, the Enterprise IoT Bootcamp is not just based on presentations but with hands-on demos and detailed walkthroughs. We will introduce you to a variety of real world use cases prototyped using Arduino, Raspberry Pi, BeagleBone, Spark, and Intel Edison. Y...
Adobe is changing the world though digital experiences. Adobe helps customers develop and deliver high-impact experiences that differentiate brands, build loyalty, and drive revenue across every screen, including smartphones, computers, tablets and TVs. Adobe content solutions are used daily by millions of companies worldwide-from publishers and broadcasters, to enterprises, marketing agencies and household-name brands. Building on its established design leadership, Adobe enables customers not o...
“We're a global managed hosting provider. Our core customer set is a U.S.-based customer that is looking to go global,” explained Adam Rogers, Managing Director at ANEXIA, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Ask someone to architect an Internet of Things (IoT) solution and you are guaranteed to see a reference to the cloud. This would lead you to believe that IoT requires the cloud to exist. However, there are many IoT use cases where the cloud is not feasible or desirable. In his session at @ThingsExpo, Dave McCarthy, Director of Products at Bsquare Corporation, will discuss the strategies that exist to extend intelligence directly to IoT devices and sensors, freeing them from the constraints of ...
SYS-CON Events announced today that Sheng Liang to Keynote at SYS-CON's 19th Cloud Expo, which will take place on November 1-3, 2016 at the Santa Clara Convention Center in Santa Clara, California.
Technology vendors and analysts are eager to paint a rosy picture of how wonderful IoT is and why your deployment will be great with the use of their products and services. While it is easy to showcase successful IoT solutions, identifying IoT systems that missed the mark or failed can often provide more in the way of key lessons learned. In his session at @ThingsExpo, Peter Vanderminden, Principal Industry Analyst for IoT & Digital Supply Chain to Flatiron Strategies, will focus on how IoT de...
24Notion is full-service global creative digital marketing, technology and lifestyle agency that combines strategic ideas with customized tactical execution. With a broad understand of the art of traditional marketing, new media, communications and social influence, 24Notion uniquely understands how to connect your brand strategy with the right consumer. 24Notion ranked #12 on Corporate Social Responsibility - Book of List.
Businesses are struggling to manage the information flow and interactions between all of these new devices and things jumping on their network, and the apps and IT systems they control. The data businesses gather is only helpful if they can do something with it. In his session at @ThingsExpo, Chris Witeck, Principal Technology Strategist at Citrix, will discuss how different the impact of IoT will be for large businesses, expanding how IoT will allow large organizations to make their legacy ap...
What does it look like when you have access to cloud infrastructure and platform under the same roof? Let’s talk about the different layers of Technology as a Service: who cares, what runs where, and how does it all fit together. In his session at 18th Cloud Expo, Phil Jackson, Lead Technology Evangelist at SoftLayer, an IBM company, spoke about the picture being painted by IBM Cloud and how the tools being crafted can help fill the gaps in your IT infrastructure.
SYS-CON Events announced today that Niagara Networks will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Niagara Networks offers the highest port-density systems, and the most complete Next-Generation Network Visibility systems including Network Packet Brokers, Bypass Switches, and Network TAPs.
What happens when the different parts of a vehicle become smarter than the vehicle itself? As we move toward the era of smart everything, hundreds of entities in a vehicle that communicate with each other, the vehicle and external systems create a need for identity orchestration so that all entities work as a conglomerate. Much like an orchestra without a conductor, without the ability to secure, control, and connect the link between a vehicle’s head unit, devices, and systems and to manage the ...
DevOps at Cloud Expo – being held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real results. Am...
In his session at @ThingsExpo, Kausik Sridharabalan, founder and CTO of Pulzze Systems, Inc., will focus on key challenges in building an Internet of Things solution infrastructure. He will shed light on efficient ways of defining interactions within IoT solutions, leading to cost and time reduction. He will also introduce ways to handle data and how one can develop IoT solutions that are lean, flexible and configurable, thus making IoT infrastructure agile and scalable.
So, you bought into the current machine learning craze and went on to collect millions/billions of records from this promising new data source. Now, what do you do with them? Too often, the abundance of data quickly turns into an abundance of problems. How do you extract that "magic essence" from your data without falling into the common pitfalls? In her session at @ThingsExpo, Natalia Ponomareva, Software Engineer at Google, provided tips on how to be successful in large scale machine learning...
Cognitive Computing is becoming the foundation for a new generation of solutions that have the potential to transform business. Unlike traditional approaches to building solutions, a cognitive computing approach allows the data to help determine the way applications are designed. This contrasts with conventional software development that begins with defining logic based on the current way a business operates. In her session at 18th Cloud Expo, Judith S. Hurwitz, President and CEO of Hurwitz & ...
An IoT product’s log files speak volumes about what’s happening with your products in the field, pinpointing current and potential issues, and enabling you to predict failures and save millions of dollars in inventory. But until recently, no one knew how to listen. In his session at @ThingsExpo, Dan Gettens, Chief Research Officer at OnProcess, will discuss recent research by Massachusetts Institute of Technology and OnProcess Technology, where MIT created a new, breakthrough analytics model f...