| By Srinivasan Sundara Rajan | Article Rating: |
|
| January 30, 2013 07:00 AM EST | Reads: |
1,292 |
Big Data & Text Analytics: As the analysis of large amounts of unstructured data is gaining a major space in enterprise computing, we are seeing the emergence of more use cases in this regard. While the term "Big" in Big Data makes it more synonymous with Massively Parallel Processing frameworks like Hadoop, however the underlying the success of Big Data relies on effective usage of content analytics of the underlying unstructured data. I have high lighted this thought process in my earlier article, Big Data Analytics Thinking Outside Of Hadoop.
Unstructured Content Analytics is defined as the process of gaining new insights from the unstructured data, by employing text mining, image recognition, voice recognition and other related analytical techniques.
Big Data Journal was launched on SYS-CON.com in 2012
The below material explains one such use case of Big Data & Text Analytics in getting meaningful insights from the Financial Reports.
Financial Reports & Analytics: All the publicly traded companies in USA & else where mandatorily disclose their corporate information to their shareholders. These annual financial statements are available as downloadable reports on the corporate websites of public companies. Apart from the annual report , there are other forms of financial statements like, investor news letters, Quarterly earning presentation, conference calls by CFO and other investor relationship documents form part of an organization's financial standing in the eyes of the investor.
Most of the investors and investment analyst firms currently uses their specialized knowledge to understand these financial statements and create meaningful insights out of them. However these analytics are mostly limited to the structured portions of the financial statements and not so much on the unstructured side of it.
To explain this more :
- For example An annual report may contain statements like Balance Sheet, Income, Equity, Cash Flows etc.. these statements are highly structured and organized as per accounting principles so that any of the qualified financial analysts can understand them
- At the same a typical financial statement also contains lot of unstructured information about growth strategies of the organization, road map, optimism, future vision, how the business model is aligned to the changing times etc...
So an effective analysis of a financial statement not only pertains to the structured information but also to the unstructured data available in the financial statements.
BigData, UIMA & Financial Report Analytics: The following Big Data aligned technologies can be effectively used in analysing the financial reports to derive meaningful insights into the large volumes of unstructured data.
- UIMA : UIMA stands for Unstructured Information Management Architecture is the major industry standard for content analytics.
- Annotators : UIMATM Annotators do the real work of extracting structured information from unstructured data. You can write your own annotators. Though Annotators form part of UIMA framework lot of custom development is written is creating Annotators specific to the needs of the Finance industry. When documents are processed through the document processing pipeline, the annotators extract concepts, words, phrases, classifications, and named entities from unstructured content and mark these extractions as annotations. The annotations are added to the index as tokens or facets and are used as the source for content analysis.
- Taxonomies : Taxonomies play a major role in identifying the topics of interest within a document using UIMA. In UIMA a type system defines the various types of objects that may be discovered in the document. Types in a UIMA type system may be organized into a taxonomy. For example, Company may be defined as a subtype of Organization
Realizing Financial Statement Analytics & Role of XBRL: There are not very many UIMA annotators and implementation of text extraction specific to financial statements. However we find that, under APACHE UIMA community there is one such annotator, The AlchemyAPI Annotator is a set of annotators that wrap the AlchemyAPI.
AlchemyAPI's (http://www.alchemyapi.com/api/) Categorization service can be used to categorize text, HTML, or web-based content, assigning the most likely topic category (news, sports, business, etc.). The business categories include topics like, Business and Finance News, SEC filings, etc.
There are several of the text analytics concepts like the below, can be applied on the financial statements
- Named Entity Extraction : Identify people, companies, organizations, cities, geographic features, and other typed entities within HTML pages and text documents/content.
- Concept Tagging : Automatically tag documents and text in a manner similar to human-based tagging.
- Keyword / Term Extraction : Extract important terms and "topic" keywords from HTML pages and text documents/content. Advanced statistical and linguistic algorithms analyze your content, "tagging" it with the most important words and phrases.
- Sentiment Analysis : Identify positive, negative and neutral sentiment within HTML pages and text documents/content.
- Relation Extraction : Identify facts and Subject-Action-Object relations within HTML pages and text documents/content.
Apart from the already developed and community supported annotators, we could develop new annotators which can take the best use of already established taxonomies for the financial industry in the form of XBRL.
XBRL stands for eXtensible Business Reporting Language. It is a language for the electronic communication of business information, providing major benefits in the preparation, analysis and communication of business information. It is one of a family of "XML" languages which is a standard means of communicating information between businesses and on the internet.
XBRL Taxonomies, are the dictionaries which the language uses. These are the categorization schemes which define the specific tags for individual items of data (such as "net profit"). National jurisdictions have different accounting regulations, so each may have its own. There are already well established approved taxonomies for financial reporting like XBRL US GAAP as listed in the site, http://www.xbrl.org/FRTApproved.
As evident from the architecture of UIMA and annotator entity extraction process, these established taxonomies can play a major role in areas like concept tagging, which can help in getting the meaningful insights from large amounts of textual and other unstructured content in the financial statements.
Summary: As enterprises and analytics vendors adopt Big Data as part of the mainstream , this adoption will be more meaningful to enable the technology to support new business use cases. Financial Analytics is one such important area , and with the support of frameworks like UIMA coupled with industry established taxonomies, such analytics are quite possible and worth to be implemented.
Published January 30, 2013 Reads 1,292
Copyright © 2013 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Srinivasan Sundara Rajan
Srinivasan Sundara Rajan (Also Known As Sundar) Is A Enterprise Technology Enabler for realizing business capabilities. His primary focus is enabling Agile Enterprises by facilitating the adoption of Every Thing As A Service Model with particular concentration on BpaaS (Business Process As A Service). He also helps enterprises in getting meaningful insights from their structured and unstructured and real time data sources. All the views expressed are Srinivasan's independent analysis of industry and solutions and need not necessarily be of his current or past organizations. Srinivasan would like to thank every one who augmented his Architectural skills with Analytical ideas.
May. 20, 2013 05:00 AM EDT Reads: 2,462 |
By Liz McMillan “Trust is an ongoing journey and sits at the foundation of any vendor relationship – the companies that don’t consistently earn trust won’t be around long,” noted Henrik Rosendahl, Senior VP of Cloud Solutions at Quantum, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “As they do more with cloud, trust will organically grow – maybe it’s just about meeting SLAs or seeing firsthand that data is there when you need it,” Rosendahl continued.
Cloud Computing Journal: The move ...May. 20, 2013 03:15 AM EDT Reads: 1,549 |
By Jeremy Geelan The economics of business are radically changing due to the way in which software and services are being delivered thanks to cloud computing. In his session at 12th Cloud Expo | Cloud Expo New York [10-13 June, 2013], Mike Kavis will cover six reasons for the disruption.May. 20, 2013 03:00 AM EDT Reads: 4,316 |
By Liz McMillan Learn about the complex regulations surrounding HIPAA compliance and other considerations for running sensitive data in the Cloud.
In their session at the 12th International Cloud Expo, Frank Nydam, Director of Healthcare Solutions at VMware, and Ken Ziegler, CEO of Logicworks, will discuss the best practices for leveraging virtualization and cloud technologies without sacrificing security or compliance. Care providers, State and Federal entities, integrators and SaaS providers large and small...May. 20, 2013 02:30 AM EDT Reads: 2,050 |
By Pat Romanski In the face of rapidly increasing amounts of unstructured data, industry is investing heavily to turn machines into services and connect them to analytics engines that will extract an extraordinary amount of value and unleash a productivity revolution for both businesses and consumers.
In the health care, transportation and energy sectors alone, the combination of machine diagnostics software and analytics will eliminate as much as $150 billion in waste.
In his session at the 12th Internation...May. 20, 2013 01:45 AM EDT Reads: 2,563 |
By Jeremy Geelan May. 20, 2013 01:00 AM EDT Reads: 3,568 |
By Elizabeth White In an ideal developer/systems administrator’s world, most applications would deploy seamlessly to multiple platforms and scale elastically with minimal effort bringing the unprecedented agility of the cloud within immediate reach of developer teams and IT organizations.
OpenStack, a RackSpace and NASA initiative, is now managed by an independent foundation and is supported by multiple vendors. It defines APIs for compute, storage, networking, services, monitoring, and additional infrastructure...May. 19, 2013 05:00 PM EDT Reads: 1,463 |
By Jeremy Geelan Companies around the world are moving into on-premise private cloud environments. Many connect their private cloud to their public cloud service providers. In his session at 12th Cloud Expo | Cloud Expo New York [June 10-13], Brian Patrick Donaghy will talk about examples of what worked, what failed and why we should think about this evolution.May. 19, 2013 04:00 PM EDT Reads: 1,931 |
By Jeremy Geelan Organizations across the world are increasingly starting to see the benefits of moving more and more services to the cloud. The focus on the cost-saving potential of cloud is rapidly shifting to completely transforming the business with cloud. As organizations are investing enormous sums on technology they are starting to realize that in order to maximize the return on investment and accelerate the business transformation process the first area of focus should be people. By ensuring the organiza...May. 19, 2013 02:00 PM EDT Reads: 1,679 |
By Liz McMillan Enterprise cloud adoption revolves around pushing the BYOD movement and focusing on data security.
In his session at the 12th International Cloud Expo, Ross Brouse, COO and President of Solar VPS, will cover how cloud adoption is driven by consumerism, humanity’s need to socialize, our addiction to new gadgets and the ability of data to stay secure in a growing collaborative world. The cloud is a drug and we’re just getting hooked.
Ross Brouse is the COO and President of Solar VPS. He is a tr...May. 19, 2013 02:00 PM EDT Reads: 1,299 |
- Cloud Expo New York: Cloud Is Changing the Economics of Business
- Cloud Expo New York Speaker Profile: Nicos Vekiarides – TwinStrata
- Windows Azure IaaS Reaches General Availability
- AMD and Adobe Collaborate on Upcoming Version of Adobe Premiere Pro Software to Enable Breakthrough Video Editing Performance Through Open Standards
- Cloud Expo New York: Deploying Hybrid Cloud for Performance and Uptime
- Big Data Isn’t About the Database, It’s About the Application
- Cloudant to Exhibit at Cloud Expo & Big Data Expo New York
- Cloud Expo New York: Rethink IT and Reinvent Business with IBM SmartCloud
- Predixion Software Announces General Availability of the Latest Version of its Predictive Analytics Platform
- The Accessibility of the Cloud
- Cloud Expo New York | Danger Ahead: Why File Sync Is NOT Endpoint Backup
- Cloud Computing Is Simplifying Things
- Cloud Expo New York: Best CIO Practices Shared from SHI’s Customers
- Examining the True Cost of Big Data
- Cloud Expo New York: Cloud Is Changing the Economics of Business
- Cloud Expo New York: How to Use Google Apps Script
- Cloud Expo New York Speaker Profile: Nicos Vekiarides – TwinStrata
- Windows Azure IaaS Reaches General Availability
- AMD and Adobe Collaborate on Upcoming Version of Adobe Premiere Pro Software to Enable Breakthrough Video Editing Performance Through Open Standards
- The Cover and the Epilogue of the Upcoming Book
- Rackspace Hosting Named “Platinum Plus Sponsor” of Cloud Expo New York
- Scripps Networks Interactive’s Popular Lifestyle Shows from HGTV, DIY Network, Food Network, Cooking Channel and Travel Channel Coming to Prime Instant Video and Amazon Instant Video
- Cloud Expo New York: Why Big Data Is Really About Small Data
- Cloud Expo New York: Deploying Hybrid Cloud for Performance and Uptime
- Cloud Expo New York: Best CIO Practices Shared from SHI’s Customers
- Cloud Computing and Big Data in 2013: What's Coming Next?
- Think You Heard It All About The Best of the Best from CES? Well, Think Again ... My eHome® -- the Gotta-Have-It Multi-Play Solution -- Targeted for Launch in First Quarter 2014
- Examining the True Cost of Big Data
- Cloud Expo New York: Cloud Is Changing the Economics of Business
- Best Practices: The Role of API Management
- OpenFeint Co-Founder Peter Relan Launches OpenKit: A Backend-as-a-Service for Cross Platform Mobile Developers Seeking Cloud Data Storage, Leaderboards, Social Network Integration and More
- Cloud Expo New York: How to Use Google Apps Script
- MapR Technologies' Senior Principal Technologist to Present at the Upcoming Telecom Analytics Conference
- Cloud Expo New York Speaker Profile: Nicos Vekiarides – TwinStrata
- Windows Azure IaaS Reaches General Availability
- AMD and Adobe Collaborate on Upcoming Version of Adobe Premiere Pro Software to Enable Breakthrough Video Editing Performance Through Open Standards








“Trust is an ongoing journey and sits at the foundation of any vendor relationship – the companies that don’t consistently earn trust won’t be around long,” noted Henrik Rosendahl, Senior VP of Cloud Solutions at Quantum, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “As they do more with cloud, trust will organically grow – maybe it’s just about meeting SLAs or seeing firsthand that data is there when you need it,” Rosendahl continued.
Cloud Computing Journal: The move ...
The economics of business are radically changing due to the way in which software and services are being delivered thanks to cloud computing. In his session at 12th Cloud Expo | Cloud Expo New York [10-13 June, 2013], Mike Kavis will cover six reasons for the disruption.
Learn about the complex regulations surrounding HIPAA compliance and other considerations for running sensitive data in the Cloud.
In their session at the 12th International Cloud Expo, Frank Nydam, Director of Healthcare Solutions at VMware, and Ken Ziegler, CEO of Logicworks, will discuss the best practices for leveraging virtualization and cloud technologies without sacrificing security or compliance. Care providers, State and Federal entities, integrators and SaaS providers large and small...
In the face of rapidly increasing amounts of unstructured data, industry is investing heavily to turn machines into services and connect them to analytics engines that will extract an extraordinary amount of value and unleash a productivity revolution for both businesses and consumers.
In the health care, transportation and energy sectors alone, the combination of machine diagnostics software and analytics will eliminate as much as $150 billion in waste.
In his session at the 12th Internation...
In an ideal developer/systems administrator’s world, most applications would deploy seamlessly to multiple platforms and scale elastically with minimal effort bringing the unprecedented agility of the cloud within immediate reach of developer teams and IT organizations.
OpenStack, a RackSpace and NASA initiative, is now managed by an independent foundation and is supported by multiple vendors. It defines APIs for compute, storage, networking, services, monitoring, and additional infrastructure...
Companies around the world are moving into on-premise private cloud environments. Many connect their private cloud to their public cloud service providers. In his session at 12th Cloud Expo | Cloud Expo New York [June 10-13], Brian Patrick Donaghy will talk about examples of what worked, what failed and why we should think about this evolution.
Organizations across the world are increasingly starting to see the benefits of moving more and more services to the cloud. The focus on the cost-saving potential of cloud is rapidly shifting to completely transforming the business with cloud. As organizations are investing enormous sums on technology they are starting to realize that in order to maximize the return on investment and accelerate the business transformation process the first area of focus should be people. By ensuring the organiza...
Enterprise cloud adoption revolves around pushing the BYOD movement and focusing on data security.
In his session at the 12th International Cloud Expo, Ross Brouse, COO and President of Solar VPS, will cover how cloud adoption is driven by consumerism, humanity’s need to socialize, our addiction to new gadgets and the ability of data to stay secure in a growing collaborative world. The cloud is a drug and we’re just getting hooked.
Ross Brouse is the COO and President of Solar VPS. He is a tr...
New technologies allow schools, colleges and universities to analyze absolutely everything that happens. From student behavior, testing results, career development of students as well as educational needs based on changing societies. A lot of this data has already been stored and is used for statist...
A recent Gartner study states that the function of the modern CIO is in flux and that his or her future focus must incorporate digital assets (aka cloud-based data and applications) to remain relevant. Towards the goal of riding the sea change a compiler of stacks to a broker of business needs, secu...
In the coming years, big data will change the way organisations and societies are operated and managed. Big data however, is not the only trend that will impact significantly how organisations operate. Another major trend at the moment is gamification. Gamification will change the way organisations ...
We all talk about cloud differently, but is there a way we should be speaking about this tech?
Cloud computing is now a widely reported, if not accepted, IT movement that, depending on who you talk to, has changed or is changing the way businesses utilize infrastructure.
The age of data center automation is upon us. Whether it's cloud or SDN or devops in general, automation as a means to achieve efficiency and, one hopes, free up resources that can be then redirected to focus on innovation.
As is always the case when we begin to move further upwards, abstracting ...
Windows Azure Virtual Networks offers the power to open up several cross-premises use case scenarios, including Active Directory Disaster Recovery, SQL Database Replication, Windows Server 2012 DFS-R File Replication, Accelerated Cloud File Services with BranchCache, Hybrid Web Applications and MORE...
As the infrastructure cloud market (IaaS and PaaS) continues to grow rapidly, we are seeing quite a few customers who are delivering an application – whether it is a mission-critical or SaaS application – and basing their solution on VMware.
VMware Security Cloud Encryption cloud keyboard Cloud Enc...
Have you heard of products like IBM’s InfoSphere Streams, Tibco’s Event Processing product, or Oracle’s CEP product? All good examples of commercially available stream processing technologies which help you process events in real-time.
I’ve been asked what I consider as “Big Data” versus “Small Dat...
My fellow Technical Evangelists and I have authored a content series that steps through building your very own Private Cloud by leveraging Windows Server 2012, our FREE Hyper-V Server 2012, Windows Azure Infrastructure Services ( IaaS ) and System Center 2012 Service Pack 1.
Week-by-week, we walk ...












