Click here to close now.




















Welcome!

@BigDataExpo Authors: Elizabeth White, Liz McMillan, Pat Romanski, Jason Bloomberg, Dana Gardner

Related Topics: @BigDataExpo, Java IoT, Industrial IoT, Microsoft Cloud, IoT User Interface, @CloudExpo

@BigDataExpo: Article

Five Big Data Features in SQL Server

Traditional RDBMS and Big Data

Traditional RDBMS & New Data Processing
Over the past two decades relational databases have been most successful in serving large scale OLTP and OLAP applications across enterprises. However, in the past couple of years with the advent of Big Data processing, especially for processing unstructured data coupled with the need for processing massive quantities of data, made the industry to look into Non RDBMS solutions. This has lead into the popularity of NOSQL databases as well as massively parallel processing frameworks.

However the traditional RDBMS were quick to react and added several Big Data features as part of their offering so that the enterprises with a heavy investment of traditional RDBMS can have the best of both worlds by properly leveraging these new features.

The following sections provide ideas about Big Data features in the popular SQL Server databases; a similar analysis will be performed against Oracle also in a later article.

1. Column Store Indexes
Column oriented databases differs from RDBMS like SQL Server such that they store data tables as sections of columns of data rather than rows of data. This has proven to be advantageous in certain situations where aggregates are computed over large number of similar data items.

Columnstore index, a feature in SQL Server 2012, groups and stores data for each column and then joins all the columns to complete the whole index. This differs from traditional indexes that group and store data for each row and then join all the rows to complete the whole index. For some types of queries, the SQL Server query processor can take advantage of the columnstore layout to significantly improve query execution times.

2. Hadoop Connectors
The SQL Server-Hadoop Connector is a Sqoop-based connector that facilitates efficient data transfer between SQL Server and Hadoop. Sqoop supports several databases including HDFS. This connector is bidirectional. You can import as well as export the data.

With SQL Server-Hadoop Connector, you can export data from:

  • delimited text files on HDFS to SQL Server
  • sequenceFiles on HDFS to SQL Server
  • hive Tables to tables in SQL Server

3. Full Text Search
There is no question about Sql Server's ability to process relational data and perform queries using JOINs and other traditional means. However much of the Big Data processing needs of the enterprise goes towards processing unstructured data which is generally in the form of textual data.

Full-Text Search in SQL Server lets users and applications run full-text queries against character-based data in SQL Server tables. Before you can run full-text queries on a table, the database administrator must create a full-text index on the table. The full-text index includes one or more character-based columns in the table.

After columns have been added to a full-text index, users and applications can run full-text queries on the text in the columns. These queries can search for any of the following:

  • One or more specific words or phrases (simple term)
  • A word or a phrase where the words begin with specified text (prefix term)
  • Inflectional forms of a specific word (generation term)
  • A word or phrase close to another word or phrase (proximity term)
  • Synonymous forms of a specific word (thesaurus)
  • Words or phrases using weighted values (weighted term)

4. Windows Azure SQL Federation/ Distributed Partition Views
Though not directly related to SQL Server, Windows Azure SQL databases are just the cloud version of the SQL server and this feature provides massively parallel processing capabilities on the database work load on cloud.

SQL Database with federation is a way to achieve greater scalability and performance from the database tier of your application through horizontal partition. One or more tables within a database with federation are split by row and portioned across multiple databases known as federation members. A federation is defined by a federation distribution scheme, or federation scheme. The federation scheme defines a federation distribution key, which determines the distribution of data to partitions within the federation.

The equivalent feature on the SQL Server database is known as DPV (Distributed Partition View). The primary SQL Server feature that allows transparent scale-out is the distributed partitioned view (DPV), sometimes referred to as federated view. In some cases, DPVs are used to help manage very large databases (VLDBs). Instead of creating and maintaining a multi-terabyte database, several smaller databases within the same instance are created.

5. Map Reduce Integration / Polybase
PolyBase is a breakthrough new technology on the data processing engine in SQL Server 2012 Parallel Data Warehouse designed as the simplest way to combine non-relational data and traditional relational data in your analysis.

Polybase unifies the relational and non-relational worlds at the query level. It provides the following Big Data processing features:

  • Integrated Query: Accepts a standard T-SQL query that joins tables containing a relational source with tables in a Hadoop cluster without needing to learn MapReduce.
  • Advanced query options: Apart from simple SELECT queries, users can perform JOINs and GROUP BYs on data in the Hadoop cluster.
  • Polybase is an exciting new technology and requires a separate coverage, but the above information just provides an introduction.

Summary
Traditional high performance RDBMS like SQL Server have their strengths. They are very strong in maintaining the data integrity and quality in the form of constraints, foreign keys and other validation mechanisms. They are also strong in transactional integrity by providing superior locking model, automatic dead lock resolution, etc. However initially they are not found to adjust to Big Data processing needs of enterprises.

With the enhancements in the products made by respective vendors , now databases like Sql Server have been enhanced with big data processing features and makes them the best candidate for enterprises looking for best of the breed features between traditional RDBMS and Big Data processing systems, and to leverage the best of existing investments.

More Stories By Srinivasan Sundara Rajan

Srinivasan is passionate about ownership and driving things on his own, with his breadth and depth on Enterprise Technology he could run any aspect of IT Industry and make it a success.

He is a seasoned Enterprise IT Expert, mainly in the areas of Solution, Integration and Architecture, across Structured, Unstructured data sources, especially in manufacturing domain.

He currently works as Technology Head For GAVS Technologies.

@BigDataExpo Stories
Growth hacking is common for startups to make unheard-of progress in building their business. Career Hacks can help Geek Girls and those who support them (yes, that's you too, Dad!) to excel in this typically male-dominated world. Get ready to learn the facts: Is there a bias against women in the tech / developer communities? Why are women 50% of the workforce, but hold only 24% of the STEM or IT positions? Some beginnings of what to do about it! In her Opening Keynote at 16th Cloud Expo, S...
With SaaS use rampant across organizations, how can IT departments track company data and maintain security? More and more departments are commissioning their own solutions and bypassing IT. A cloud environment is amorphous and powerful, allowing you to set up solutions for all of your user needs: document sharing and collaboration, mobile access, e-mail, even industry-specific applications. In his session at 16th Cloud Expo, Shawn Mills, President and a founder of Green House Data, discussed h...
"Our biggest growth area has been the security services, the managed services - the things that differentiate us in the market that there is no client that's too small and there's no client that's too big," explained Paul Mazzucco, Chief Security Officer at TierPoint, in this SYS-CON.tv interview at 16th Cloud Expo, held June 9-11, 2015, at the Javits Center in New York City.
The Cloud industry has moved from being more than just being able to provide infrastructure and management services on the Cloud. Enter a new era of Cloud computing where monetization’s services through the Cloud are an essential piece of strategy to feed your organizations bottom-line, your revenue and Profitability. In their session at 16th Cloud Expo, Ermanno Bonifazi, CEO & Founder of Solgenia, and Ian Khan, Global Strategic Positioning & Brand Manager at Solgenia, discussed how to easily o...
"We've just seen a huge influx of new partners coming into our ecosystem, and partners building unique offerings on top of our API set," explained Seth Bostock, Chief Executive Officer at IndependenceIT, in this SYS-CON.tv interview at 16th Cloud Expo, held June 9-11, 2015, at the Javits Center in New York City.
"We do data integration for B2B also application to application, and we do data management and enable Big Data," explained Pat Adamiak, Vice President, Product Marketing at Liaison Technologies, in this SYS-CON.tv interview at 16th Cloud Expo, held June 9-11, 2015, at the Javits Center in New York City.
Discussions about cloud computing are evolving into discussions about enterprise IT in general. As enterprises increasingly migrate toward their own unique clouds, new issues such as the use of containers and microservices emerge to keep things interesting. In this Power Panel at 16th Cloud Expo, moderated by Conference Chair Roger Strukhoff, panelists addressed the state of cloud computing today, and what enterprise IT professionals need to know about how the latest topics and trends affect t...
For IoT to grow as quickly as analyst firms’ project, a lot is going to fall on developers to quickly bring applications to market. But the lack of a standard development platform threatens to slow growth and make application development more time consuming and costly, much like we’ve seen in the mobile space. In his session at @ThingsExpo, Mike Weiner, Product Manager of the Omega DevCloud with KORE Telematics Inc., discussed the evolving requirements for developers as IoT matures and conducte...
In the midst of the widespread popularity and adoption of cloud computing, it seems like everything is being offered “as a Service” these days: Infrastructure? Check. Platform? You bet. Software? Absolutely. Toaster? It’s only a matter of time. With service providers positioning vastly differing offerings under a generic “cloud” umbrella, it’s all too easy to get confused about what’s actually being offered. In his session at 16th Cloud Expo, Kevin Hazard, Director of Digital Content for SoftL...
The essence of cloud computing is that all consumable IT resources are delivered as services. In his session at 15th Cloud Expo, Yung Chou, Technology Evangelist at Microsoft, demonstrated the concepts and implementations of two important cloud computing deliveries: Infrastructure as a Service (IaaS) and Platform as a Service (PaaS). He discussed from business and technical viewpoints what exactly they are, why we care, how they are different and in what ways, and the strategies for IT to tran...
SYS-CON Events announced today that HPM Networks will exhibit at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. For 20 years, HPM Networks has been integrating technology solutions that solve complex business challenges. HPM Networks has designed solutions for both SMB and enterprise customers throughout the San Francisco Bay Area.
Business as usual for IT is evolving into a "Make or Buy" decision on a service-by-service conversation with input from the LOBs. How does your organization move forward with cloud? In his general session at 16th Cloud Expo, Paul Maravei, Regional Sales Manager, Hybrid Cloud and Managed Services at Cisco, discusses how Cisco and its partners offer a market-leading portfolio and ecosystem of cloud infrastructure and application services that allow you to uniquely and securely combine cloud busine...
One of the hottest areas in cloud right now is DRaaS and related offerings. In his session at 16th Cloud Expo, Dale Levesque, Disaster Recovery Product Manager with Windstream's Cloud and Data Center Marketing team, will discuss the benefits of the cloud model, which far outweigh the traditional approach, and how enterprises need to ensure that their needs are properly being met.
It is one thing to build single industrial IoT applications, but what will it take to build the Smart Cities and truly society-changing applications of the future? The technology won’t be the problem, it will be the number of parties that need to work together and be aligned in their motivation to succeed. In his session at @ThingsExpo, Jason Mondanaro, Director, Product Management at Metanga, discussed how you can plan to cooperate, partner, and form lasting all-star teams to change the world...
Even as cloud and managed services grow increasingly central to business strategy and performance, challenges remain. The biggest sticking point for companies seeking to capitalize on the cloud is data security. Keeping data safe is an issue in any computing environment, and it has been a focus since the earliest days of the cloud revolution. Understandably so: a lot can go wrong when you allow valuable information to live outside the firewall. Recent revelations about government snooping, along...
"CenturyLink brings a full suite of services to the table and that enables us to be an IT service provider," explained Jeff Katzen, Director of the Cloud Practice at CenturyLink, in this SYS-CON.tv interview at 16th Cloud Expo, held June 9-11, 2015, at the Javits Center in New York City.
Converging digital disruptions is creating a major sea change - Cisco calls this the Internet of Everything (IoE). IoE is the network connection of People, Process, Data and Things, fueled by Cloud, Mobile, Social, Analytics and Security, and it represents a $19Trillion value-at-stake over the next 10 years. In her keynote at @ThingsExpo, Manjula Talreja, VP of Cisco Consulting Services, discussed IoE and the enormous opportunities it provides to public and private firms alike. She will share w...
In his keynote at 16th Cloud Expo, Rodney Rogers, CEO of Virtustream, discussed the evolution of the company from inception to its recent acquisition by EMC – including personal insights, lessons learned (and some WTF moments) along the way. Learn how Virtustream’s unique approach of combining the economics and elasticity of the consumer cloud model with proper performance, application automation and security into a platform became a breakout success with enterprise customers and a natural fit f...
Public Cloud IaaS started its life in the developer and startup communities and has grown rapidly to a $20B+ industry, but it still pales in comparison to how much is spent worldwide on IT: $3.6 trillion. In fact, there are 8.6 million data centers worldwide, the reality is many small and medium sized business have server closets and colocation footprints filled with servers and storage gear. While on-premise environment virtualization may have peaked at 75%, the Public Cloud has lagged in adop...
Malicious agents are moving faster than the speed of business. Even more worrisome, most companies are relying on legacy approaches to security that are no longer capable of meeting current threats. In the modern cloud, threat diversity is rapidly expanding, necessitating more sophisticated security protocols than those used in the past or in desktop environments. Yet companies are falling for cloud security myths that were truths at one time but have evolved out of existence.