@DXWorldExpo Authors: Liz McMillan, Elizabeth White, Zakia Bouachraoui, Pat Romanski, Maria C. Horton

Related Topics: @DXWorldExpo, Java IoT, Microservices Expo, Recurring Revenue, @CloudExpo, SDN Journal

@DXWorldExpo: Article

Five Big Data Features in Oracle

Traditional RDBMS and new data processing

Over the past two decades relational databases have been most successful in serving large scale OLTP and OLAP applications across enterprises. However, in the past couple of years with the advent of Big Data, processing especially processing unstructured data coupled with the need for processing massive quantities of data, made the industry to look into non RDBMS solutions. This has led to the popularity of NOSQL databases as well as massively parallel processing frameworks.

However, the traditional RDBMS have been quick to react and added several Big Data features as part of their offering so that enterprises with a heavy investment in traditional RDBMS can have the best of both worlds by properly leveraging these new features.

The following sections provide an idea about Big Data features in the popular Oracle databases. Please refer to my earlier articles on Five Big Data Features in SQL Server, Five Big Data Features in DB2.

1. External Tables: As the name suggests, an external table accesses data in external sources as if this data were in a table in the database. In the earlier releases external tables were mainly used to access CSV files and Oracle Loader files. However, to support Big Data, Oracle has released a direct connector to the Hadoop HDFS file system on which an external table can be built. With a SQL-like CREATE TABLE syntax an external table feature allows easy access to the HDFS file system. Oracle SQL Connector for HDFS creates the external table definition from a Hive table by contacting the Hive meta store client to retrieve information about the table columns and the location of the table data. In addition, the Hive table data paths are published to the location files of the Oracle external table.

Considering the distinct advantages of the Columnar databases for certain types of workloads, external tables also support Columnar storage. With Hybrid Columnar Compression, the database stores the same column for a group of rows together. The data block does not store data in row-major format, but uses a combination of both row and columnar methods.

Storing column data together, with the same data type and similar characteristics, dramatically increases the storage savings achieved from compression. The database compresses data manipulated by any SQL operation, although compression levels are higher for direct path loads. Database operations work transparently against compressed objects, so no application changes are required.

As a complimentary option, Oracle also provides a Loader for Hadoop. Oracle Loader for Hadoop is a MapReduce application that is invoked as a command-line utility. It provides an efficient and high-performance loader for fast movement of data from a Hadoop cluster into a table in an Oracle database.

2. Oracle Text: Oracle Text enables you to build text query applications and document classification applications. Oracle Text provides indexing, word and theme searching, and viewing capabilities for text. Oracle Text indexes text by converting all words into tokens. The general structure of an Oracle Text CONTEXT index is an inverted index where each token contains the list of documents (rows) that contain that token. The lexer breaks the text into tokens according to your language. These tokens are usually words. Oracle Text can index most document formats including HTML, PDF, Microsoft Word, and plain text, you can load any supported type into the text column. Oracle Text can index most languages. BASIC_LEXER preference type to index whitespace-delimited languages such as English, French, German, and Spanish. MULTI_LEXER preference type for indexing tables containing documents of different languages such as English, German, and Japanese.

The basic Oracle Text query takes a query expression, usually a word with or without operators, as input. Oracle Text returns all documents (previously indexed) that satisfy the expression along with a relevance score for each document. Defining a custom thesaurus enables you to process queries more intelligently. Because users of your application might not know which words represent a topic, you can define synonyms or narrower terms for likely query terms. You can use the thesaurus operators to expand your query into your thesaurus terms.

With extensive support for processing unstructured text document, Oracle Text can play a major role in Big Data processing.

3. VLDB Partitioning: If one of the appealing features of Big Data frameworks is the ability to split large quantities of data across multiple nodes, then Oracle's partitioning features performs the similar functionality and it exists for a while. Partitioning addresses key issues in supporting very large tables and indexes by decomposing them into smaller and more manageable pieces called partitions, which are entirely transparent to an application. SQL queries and Data Manipulation Language (DML) statements do not need to be modified to access partitioned tables.

Partitioning is a critical feature for managing very large databases. Growth is the basic challenge that partitioning addresses for very large databases, and partitioning enables a divide and conquer technique for managing the tables and indexes in the database, especially as those tables and indexes grow.

Oracle also supports many different types of partitioning types depending on the nature of the applications.

  • Range Partitioning: Range partitioning maps data to partitions based on ranges of values of the partitioning key that you establish for each partition.
  • Hash Partitioning: Hash partitioning maps data to partitions based on a hashing algorithm that Oracle applies to the partitioning key that you identify.
  • List Partitioning: List partitioning enables you to explicitly control how rows map to partitions by specifying a list of discrete values for the partitioning key in the description for each partition.
  • Composite Partitioning: Composite partitioning is a combination of the basic data distribution methods; a table is partitioned by one data distribution method and then each partition is further subdivided into subpartitions using a second data distribution method.

4. Native Parallelism & Grid Computing: While the MPP (Massively Parallel Processing) computing forms the basis of Big Data Processing that does not mean the complimentary computing of SMP (symmetric processing) cannot be utilized for processing large quantities of data and many of the Big Data features in Oracle like VLDB partitioning fully utilize the SMP power of servers.

Parallel execution enables the application of multiple CPU and I/O resources to the execution of a single database operation. It dramatically reduces response time for data-intensive operations on large databases. You can use parallel queries and parallel subqueries in SELECT statements and execute in parallel the query portions of DDL statements and DML statements (INSERT, UPDATE, and DELETE). You can also query external tables in parallel.

Oracle also supports MPP kind of parallelism using Oracle Real Application Clusters (Oracle RAC). Oracle RAC enables you to cluster an Oracle database. Oracle RAC uses Oracle Clusterware for the infrastructure to bind multiple servers so they operate as a single system. While RAC may not be suitable for analytical workloads but in conjunction with other features it may help real time analytics.

5. XML DB: Oracle XML DB is a set of Oracle Database technologies related to high-performance handling of XML data: storing, generating, accessing, searching, validating, transforming, evolving, and indexing. It provides native XML support by encompassing both the SQL and XML data models in an interoperable way. Oracle XML DB is included as part of Oracle Database .

XMLType is an abstract data type for native handling of XML data in the database. This data type is integrated with the regular RDBMS tables so that this can be just another column in a table. The table with XMLType can be partitioned using the above mentioned VLDB partitioning techniques making it a good candidate for Big Data processing. There is another component of XMLDB namely Oracle XML DB Repository. Using XML DB Repository we can store any kind of documents in the repository, including XML documents that are associated with an XML schema.

Traditional high performance RDBMS like Oracle have their strengths. They are very strong in maintaining the data integrity and quality in the form of constraints, foreign keys and other validation mechanisms. They are also strong in transactional integrity by providing a superior locking model, automatic dead lock resolution, etc. Howeve, initially they are not perceived to adjust to Big Data processing needs of enterprises.

With the enhancements in the products made by respective vendors, now databases like Oracle have been enhanced with Big Data processing features that makes them the best candidate for enterprises looking for best-of- breed features between traditional RDBMS and Big Data processing systems, and to leverage the best of existing investments.

More Stories By Srinivasan Sundara Rajan

Highly passionate about utilizing Digital Technologies to enable next generation enterprise. Believes in enterprise transformation through the Natives (Cloud Native & Mobile Native).

DXWorldEXPO Digital Transformation Stories
@CloudEXPO and @ExpoDX, two of the most influential technology events in the world, have hosted hundreds of sponsors and exhibitors since our launch 10 years ago. @CloudEXPO and @ExpoDX New York and Silicon Valley provide a full year of face-to-face marketing opportunities for your company. Each sponsorship and exhibit package comes with pre and post-show marketing programs. By sponsoring and exhibiting in New York and Silicon Valley, you reach a full complement of decision makers and buyers in ...
The Internet of Things is clearly many things: data collection and analytics, wearables, Smart Grids and Smart Cities, the Industrial Internet, and more. Cool platforms like Arduino, Raspberry Pi, Intel's Galileo and Edison, and a diverse world of sensors are making the IoT a great toy box for developers in all these areas. In this Power Panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists discussed what things are the most important, which will have the most profound e...
"Cloud computing is certainly changing how people consume storage, how they use it, and what they use it for. It's also making people rethink how they architect their environment," stated Brad Winett, Senior Technologist for DDN Storage, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
While the focus and objectives of IoT initiatives are many and diverse, they all share a few common attributes, and one of those is the network. Commonly, that network includes the Internet, over which there isn't any real control for performance and availability. Or is there? The current state of the art for Big Data analytics, as applied to network telemetry, offers new opportunities for improving and assuring operational integrity. In his session at @ThingsExpo, Jim Frey, Vice President of S...
Rodrigo Coutinho is part of OutSystems' founders' team and currently the Head of Product Design. He provides a cross-functional role where he supports Product Management in defining the positioning and direction of the Agile Platform, while at the same time promoting model-based development and new techniques to deliver applications in the cloud.
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settl...
"We were founded in 2003 and the way we were founded was about good backup and good disaster recovery for our clients, and for the last 20 years we've been pretty consistent with that," noted Marc Malafronte, Territory Manager at StorageCraft, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
LogRocket helps product teams develop better experiences for users by recording videos of user sessions with logs and network data. It identifies UX problems and reveals the root cause of every bug. LogRocket presents impactful errors on a website, and how to reproduce it. With LogRocket, users can replay problems.
Data Theorem is a leading provider of modern application security. Its core mission is to analyze and secure any modern application anytime, anywhere. The Data Theorem Analyzer Engine continuously scans APIs and mobile applications in search of security flaws and data privacy gaps. Data Theorem products help organizations build safer applications that maximize data security and brand protection. The company has detected more than 300 million application eavesdropping incidents and currently secu...