Welcome!

Big Data Journal Authors: Liz McMillan, Pat Romanski, Jeremy Geelan, Greg Schulz, Elad Yoran

Related Topics: Big Data Journal

Big Data Journal: Blog Post

Big Data in Financial Analytics

Big Data and UIMA use cases

Big Data & Text Analytics: As  the analysis  of  large amounts  of unstructured  data is gaining a major space in enterprise  computing,  we  are seeing the emergence of more use cases in this regard.  While  the  term   "Big"  in Big Data   makes it more synonymous  with  Massively Parallel Processing frameworks like Hadoop,  however  the  underlying the success of  Big Data  relies  on effective usage of  content analytics  of the underlying  unstructured data.  I have high lighted  this thought process in my earlier  article, Big Data Analytics Thinking Outside Of Hadoop.

Unstructured Content Analytics is  defined  as the   process of  gaining  new insights  from  the  unstructured data, by  employing   text mining, image recognition, voice recognition and other related analytical techniques.


Big Data Journal was launched on SYS-CON.com in 2012

The below  material  explains   one such use case of  Big Data &  Text Analytics in  getting meaningful insights  from the Financial  Reports.

Financial Reports  & Analytics: All the  publicly  traded  companies in USA & else where  mandatorily  disclose  their corporate information to their  shareholders.  These annual financial statements   are available  as  downloadable reports  on the corporate websites  of  public  companies.   Apart  from the  annual report , there are other forms  of financial statements  like,  investor news letters, Quarterly earning presentation, conference calls by CFO  and other investor relationship documents form part of  an  organization's  financial standing in the eyes  of the  investor.

Most  of the  investors  and  investment analyst  firms  currently  uses  their specialized   knowledge to understand  these financial  statements  and  create meaningful  insights  out of them.  However  these analytics  are mostly limited to the structured  portions  of  the financial statements and not so much  on the unstructured  side of it.

To explain this more :

  • For example An annual report may contain statements like Balance Sheet, Income, Equity, Cash Flows etc.. these statements are highly structured and organized as per accounting principles so that any of the qualified financial analysts can understand them
  • At the same a typical financial statement also contains lot of unstructured information about growth strategies of the organization, road map, optimism, future vision, how the business model is aligned to the changing times etc...

So   an effective  analysis of  a financial statement  not only pertains  to the structured information but also to the unstructured  data available in the  financial statements.

BigData, UIMA  & Financial Report Analytics: The following   Big Data aligned  technologies  can be effectively used  in analysing the  financial  reports  to derive meaningful insights into the  large volumes  of unstructured data.

  • UIMA : UIMA stands for Unstructured Information Management Architecture is the major industry standard for content analytics.

 

  • Annotators : UIMATM Annotators do the real work of extracting structured information from unstructured data. You can write your own annotators. Though Annotators form part of UIMA framework lot of custom development is written is creating Annotators specific to the needs of the Finance industry. When documents are processed through the document processing pipeline, the annotators extract concepts, words, phrases, classifications, and named entities from unstructured content and mark these extractions as annotations. The annotations are added to the index as tokens or facets and are used as the source for content analysis.

  • Taxonomies : Taxonomies play a major role in identifying the topics of interest within a document using UIMA. In UIMA a type system defines the various types of objects that may be discovered in the document. Types in a UIMA type system may be organized into a taxonomy. For example, Company may be defined as a subtype of Organization

 

Realizing Financial Statement Analytics & Role of  XBRL: There  are not very many  UIMA  annotators  and  implementation of   text extraction specific to financial statements.  However  we find that,  under  APACHE UIMA community  there is  one such annotator,   The AlchemyAPI Annotator is a set of annotators that wrap the AlchemyAPI.

AlchemyAPI's  (http://www.alchemyapi.com/api/)  Categorization service can be used to categorize text, HTML, or web-based content, assigning the most likely topic category (news, sports, business, etc.).  The business categories  include  topics like, Business and Finance News, SEC filings, etc.

There  are  several  of  the   text analytics concepts  like  the below,  can be applied on the financial statements

  • Named Entity Extraction : Identify people, companies, organizations, cities, geographic features, and other typed entities within HTML pages and text documents/content.
  • Concept Tagging : Automatically tag documents and text in a manner similar to human-based tagging.
  • Keyword / Term Extraction : Extract important terms and "topic" keywords from HTML pages and text documents/content. Advanced statistical and linguistic algorithms analyze your content, "tagging" it with the most important words and phrases.
  • Sentiment Analysis : Identify positive, negative and neutral sentiment within HTML pages and text documents/content.
  • Relation Extraction : Identify facts and Subject-Action-Object relations within HTML pages and text documents/content.

Apart  from  the  already  developed  and community  supported  annotators,  we could   develop  new annotators  which  can take the best use of already  established  taxonomies  for the financial industry   in the form of  XBRL.

XBRL stands for eXtensible Business Reporting Language. It is a language for the electronic communication of business information, providing major benefits in the preparation, analysis and communication of business information. It is one of a family of "XML" languages which is a standard means of communicating information between businesses and on the internet.

XBRL Taxonomies,  are the dictionaries which the language uses. These are the categorization schemes which define the specific tags for individual items of data (such as "net profit").  National jurisdictions have different accounting regulations, so each may have its own.  There are already well established  approved taxonomies  for  financial reporting  like  XBRL  US  GAAP  as listed in the  site, http://www.xbrl.org/FRTApproved.

As  evident  from  the  architecture  of UIMA  and annotator  entity extraction process, these established  taxonomies  can play a major role in areas like concept tagging,  which  can help in  getting the  meaningful insights  from    large  amounts of  textual and other  unstructured content in the financial statements.

Summary: As  enterprises  and analytics vendors  adopt  Big Data  as part of the mainstream ,  this  adoption will be  more meaningful  to  enable   the technology  to support new  business use cases.  Financial  Analytics  is  one such important area  ,  and with the support of    frameworks like UIMA  coupled  with  industry established taxonomies,  such  analytics  are quite possible  and worth to be implemented.

More Stories By Srinivasan Sundara Rajan

Srinivasan Sundara Rajan (Also Known As Sundar) Is A Enterprise Technology Enabler for realizing business capabilities. His primary focus is enabling Agile Enterprises by facilitating the adoption of Every Thing As A Service Model with particular concentration on BpaaS (Business Process As A Service). He also helps enterprises in getting meaningful insights from their structured and unstructured and real time data sources. All the views expressed are Srinivasan's independent analysis of industry and solutions and need not necessarily be of his current or past organizations. Srinivasan would like to thank every one who augmented his Architectural skills with Analytical ideas.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Cloud Expo Breaking News
A recent study by analyst firm IDC reports that in 2012, 1.7 million cloud computing-related roles across the globe could not be filled due to the lack of training, certification and experience in the applicant pool. As the global demand for cloud and big data expertise increases, employers are finding it difficult to recruit talent, which is slowing down the ability for organizations to adopt, implement, and realize benefits from innovative platforms like OpenStack. In this session join Clo...
“Trust is an ongoing journey and sits at the foundation of any vendor relationship – the companies that don’t consistently earn trust won’t be around long,” noted Henrik Rosendahl, Senior VP of Cloud Solutions at Quantum, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “As they do more with cloud, trust will organically grow – maybe it’s just about meeting SLAs or seeing firsthand that data is there when you need it,” Rosendahl continued. Cloud Computing Journal: The move ...
The economics of business are radically changing due to the way in which software and services are being delivered thanks to cloud computing. In his session at 12th Cloud Expo | Cloud Expo New York [10-13 June, 2013], Mike Kavis will cover six reasons for the disruption.
Learn about the complex regulations surrounding HIPAA compliance and other considerations for running sensitive data in the Cloud. In their session at the 12th International Cloud Expo, Frank Nydam, Director of Healthcare Solutions at VMware, and Ken Ziegler, CEO of Logicworks, will discuss the best practices for leveraging virtualization and cloud technologies without sacrificing security or compliance. Care providers, State and Federal entities, integrators and SaaS providers large and small...
In the face of rapidly increasing amounts of unstructured data, industry is investing heavily to turn machines into services and connect them to analytics engines that will extract an extraordinary amount of value and unleash a productivity revolution for both businesses and consumers. In the health care, transportation and energy sectors alone, the combination of machine diagnostics software and analytics will eliminate as much as $150 billion in waste. In his session at the 12th Internation...
Enterprises can't close their doors just because integration tools won't cope with the volume of information that their systems produce. As each day goes by, their information will become larger and more complicated, and enterprises must constantly struggle to manage the integration of dozens (or hundreds) of systems. Apache Hadoop has quickly become the technology of choice for enterprises that need to perform complex analysis of petabytes of data, but few are aware of its potential to hand...
In an ideal developer/systems administrator’s world, most applications would deploy seamlessly to multiple platforms and scale elastically with minimal effort bringing the unprecedented agility of the cloud within immediate reach of developer teams and IT organizations. OpenStack, a RackSpace and NASA initiative, is now managed by an independent foundation and is supported by multiple vendors. It defines APIs for compute, storage, networking, services, monitoring, and additional infrastructure...
Companies around the world are moving into on-premise private cloud environments. Many connect their private cloud to their public cloud service providers. In his session at 12th Cloud Expo | Cloud Expo New York [June 10-13], Brian Patrick Donaghy will talk about examples of what worked, what failed and why we should think about this evolution.
Organizations across the world are increasingly starting to see the benefits of moving more and more services to the cloud. The focus on the cost-saving potential of cloud is rapidly shifting to completely transforming the business with cloud. As organizations are investing enormous sums on technology they are starting to realize that in order to maximize the return on investment and accelerate the business transformation process the first area of focus should be people. By ensuring the organiza...
Enterprise cloud adoption revolves around pushing the BYOD movement and focusing on data security. In his session at the 12th International Cloud Expo, Ross Brouse, COO and President of Solar VPS, will cover how cloud adoption is driven by consumerism, humanity’s need to socialize, our addiction to new gadgets and the ability of data to stay secure in a growing collaborative world. The cloud is a drug and we’re just getting hooked. Ross Brouse is the COO and President of Solar VPS. He is a tr...