Click here to close now.

Welcome!

Big Data Journal Authors: Carmen Gonzalez, Elizabeth White, William Schmarzo, Pat Romanski, Harry Trott

Related Topics: Big Data Journal

Big Data Journal: Blog Post

Big Data in Financial Analytics

Big Data and UIMA use cases

Big Data & Text Analytics: As  the analysis  of  large amounts  of unstructured  data is gaining a major space in enterprise  computing,  we  are seeing the emergence of more use cases in this regard.  While  the  term   "Big"  in Big Data   makes it more synonymous  with  Massively Parallel Processing frameworks like Hadoop,  however  the  underlying the success of  Big Data  relies  on effective usage of  content analytics  of the underlying  unstructured data.  I have high lighted  this thought process in my earlier  article, Big Data Analytics Thinking Outside Of Hadoop.

Unstructured Content Analytics is  defined  as the   process of  gaining  new insights  from  the  unstructured data, by  employing   text mining, image recognition, voice recognition and other related analytical techniques.


Big Data Journal was launched on SYS-CON.com in 2012

The below  material  explains   one such use case of  Big Data &  Text Analytics in  getting meaningful insights  from the Financial  Reports.

Financial Reports  & Analytics: All the  publicly  traded  companies in USA & else where  mandatorily  disclose  their corporate information to their  shareholders.  These annual financial statements   are available  as  downloadable reports  on the corporate websites  of  public  companies.   Apart  from the  annual report , there are other forms  of financial statements  like,  investor news letters, Quarterly earning presentation, conference calls by CFO  and other investor relationship documents form part of  an  organization's  financial standing in the eyes  of the  investor.

Most  of the  investors  and  investment analyst  firms  currently  uses  their specialized   knowledge to understand  these financial  statements  and  create meaningful  insights  out of them.  However  these analytics  are mostly limited to the structured  portions  of  the financial statements and not so much  on the unstructured  side of it.

To explain this more :

  • For example An annual report may contain statements like Balance Sheet, Income, Equity, Cash Flows etc.. these statements are highly structured and organized as per accounting principles so that any of the qualified financial analysts can understand them
  • At the same a typical financial statement also contains lot of unstructured information about growth strategies of the organization, road map, optimism, future vision, how the business model is aligned to the changing times etc...

So   an effective  analysis of  a financial statement  not only pertains  to the structured information but also to the unstructured  data available in the  financial statements.

BigData, UIMA  & Financial Report Analytics: The following   Big Data aligned  technologies  can be effectively used  in analysing the  financial  reports  to derive meaningful insights into the  large volumes  of unstructured data.

  • UIMA : UIMA stands for Unstructured Information Management Architecture is the major industry standard for content analytics.

 

  • Annotators : UIMATM Annotators do the real work of extracting structured information from unstructured data. You can write your own annotators. Though Annotators form part of UIMA framework lot of custom development is written is creating Annotators specific to the needs of the Finance industry. When documents are processed through the document processing pipeline, the annotators extract concepts, words, phrases, classifications, and named entities from unstructured content and mark these extractions as annotations. The annotations are added to the index as tokens or facets and are used as the source for content analysis.

  • Taxonomies : Taxonomies play a major role in identifying the topics of interest within a document using UIMA. In UIMA a type system defines the various types of objects that may be discovered in the document. Types in a UIMA type system may be organized into a taxonomy. For example, Company may be defined as a subtype of Organization

 

Realizing Financial Statement Analytics & Role of  XBRL: There  are not very many  UIMA  annotators  and  implementation of   text extraction specific to financial statements.  However  we find that,  under  APACHE UIMA community  there is  one such annotator,   The AlchemyAPI Annotator is a set of annotators that wrap the AlchemyAPI.

AlchemyAPI's  (http://www.alchemyapi.com/api/)  Categorization service can be used to categorize text, HTML, or web-based content, assigning the most likely topic category (news, sports, business, etc.).  The business categories  include  topics like, Business and Finance News, SEC filings, etc.

There  are  several  of  the   text analytics concepts  like  the below,  can be applied on the financial statements

  • Named Entity Extraction : Identify people, companies, organizations, cities, geographic features, and other typed entities within HTML pages and text documents/content.
  • Concept Tagging : Automatically tag documents and text in a manner similar to human-based tagging.
  • Keyword / Term Extraction : Extract important terms and "topic" keywords from HTML pages and text documents/content. Advanced statistical and linguistic algorithms analyze your content, "tagging" it with the most important words and phrases.
  • Sentiment Analysis : Identify positive, negative and neutral sentiment within HTML pages and text documents/content.
  • Relation Extraction : Identify facts and Subject-Action-Object relations within HTML pages and text documents/content.

Apart  from  the  already  developed  and community  supported  annotators,  we could   develop  new annotators  which  can take the best use of already  established  taxonomies  for the financial industry   in the form of  XBRL.

XBRL stands for eXtensible Business Reporting Language. It is a language for the electronic communication of business information, providing major benefits in the preparation, analysis and communication of business information. It is one of a family of "XML" languages which is a standard means of communicating information between businesses and on the internet.

XBRL Taxonomies,  are the dictionaries which the language uses. These are the categorization schemes which define the specific tags for individual items of data (such as "net profit").  National jurisdictions have different accounting regulations, so each may have its own.  There are already well established  approved taxonomies  for  financial reporting  like  XBRL  US  GAAP  as listed in the  site, http://www.xbrl.org/FRTApproved.

As  evident  from  the  architecture  of UIMA  and annotator  entity extraction process, these established  taxonomies  can play a major role in areas like concept tagging,  which  can help in  getting the  meaningful insights  from    large  amounts of  textual and other  unstructured content in the financial statements.

Summary: As  enterprises  and analytics vendors  adopt  Big Data  as part of the mainstream ,  this  adoption will be  more meaningful  to  enable   the technology  to support new  business use cases.  Financial  Analytics  is  one such important area  ,  and with the support of    frameworks like UIMA  coupled  with  industry established taxonomies,  such  analytics  are quite possible  and worth to be implemented.

More Stories By Srinivasan Sundara Rajan

Srinivasan is passionate about ownership and driving things on his own, with his breadth and depth on Enterprise Technology he could run any aspect of IT Industry and make it a success.

He is a seasoned Enterprise IT Expert, mainly in the areas of Solution, Integration and Architecture, across Structured, Unstructured data sources, especially in manufacturing domain.

He currently works as Technology Head For GAVS Technologies.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@BigDataExpo Stories
17th Cloud Expo, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterprises a...
As cloud gives an opportunity to businesses to buy services externally – how is cloud impacting your customers? In his General Session at 15th Cloud Expo, Fabio Gori, Director of Worldwide Cloud Marketing at Cisco, provided answers to big questions: Do you see hybrid cloud as where the world is going? What benefits does it bring? And how does Cisco connect all of these clouds? He also discussed Intercloud and Cisco’s investment on it.
The 17th International Cloud Expo has announced that its Call for Papers is open. 17th International Cloud Expo, to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, APM, APIs, Microservices, Security, Big Data, Internet of Things, DevOps and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding bu...
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo in Silicon Valley. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 17th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading in...
Software is eating the world. Companies that were not previously in the technology space now find themselves competing with Google and Amazon on speed of innovation. As the innovation cycle accelerates, companies must embrace rapid and constant change to both applications and their infrastructure, and find a way to deliver speed and agility of development without sacrificing reliability or efficiency of operations. In her Day 2 Keynote DevOps Summit, Victoria Livschitz, CEO of Qubell, discussed...
Working with Big Data is challenging, especially when decision makers depend on market insights and intelligence from your data but don't have quick access to it or find it unusable. In their session at 6th Big Data Expo, Ian Khan, Global Strategic Positioning & Brand Manager at Solgenia; Zel Bianco, President, CEO and Co-Founder of Interactive Edge of Solgenia; and Ermanno Bonifazi, CEO & Founder at Solgenia, discussed how a revolutionary cloud-based BI along with mobile analytics is already c...
Gartner predicts that the bulk of new IT spending by 2016 will be for cloud platforms and applications and that nearly half of large enterprises will have cloud deployments by the end of 2017. The benefits of the cloud may be clear for applications that can tolerate brief periods of downtime, but for critical applications like SQL Server, Oracle and SAP, companies need a strategy for HA and DR protection. While traditional SAN-based clusters are not possible in these environments, SANless cluste...
Hardware will never be more valuable than on the day it hits your loading dock. Each day new servers are not deployed to production the business is losing money. While Moore's Law is typically cited to explain the exponential density growth of chips, a critical consequence of this is rapid depreciation of servers. The hardware for clustered systems (e.g., Hadoop, OpenStack) tends to be significant capital expenses. In his session at Big Data Expo, Mason Katz, CTO and co-founder of StackIQ, disc...
All major researchers estimate there will be tens of billions devices - computers, smartphones, tablets, and sensors - connected to the Internet by 2020. This number will continue to grow at a rapid pace for the next several decades. With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo, June 9-11, 2015, at the Javits Center in New York City. Learn what is going on, contribute to the discussions, and ensure that your enter...
In their general session at 16th Cloud Expo, Michael Piccininni, Global Account Manager – Cloud SP at EMC Corporation, and Mike Dietze, Regional Director at Windstream Hosted Solutions, will review next generation cloud services, including the Windstream-EMC Tier Storage solutions, and discuss how to increase efficiencies, improve service delivery and enhance corporate cloud solution development. Speaker Bios Michael Piccininni is Global Account Manager – Cloud SP at EMC Corporation. He has b...
SYS-CON Events announced today that DragonGlass, an enterprise search platform, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. After eleven years of designing and building custom applications, OpenCrowd has launched DragonGlass, a cloud-based platform that enables the development of search-based applications. These are a new breed of applications that utilize a search index as their backbone for data...
As the Internet of Things unfolds, mobile and wearable devices are blurring the line between physical and digital, integrating ever more closely with our interests, our routines, our daily lives. Contextual computing and smart, sensor-equipped spaces bring the potential to walk through a world that recognizes us and responds accordingly. We become continuous transmitters and receivers of data. In his session at @ThingsExpo, Andrew Bolwell, Director of Innovation for HP's Printing and Personal S...
There is no doubt that Big Data is here and getting bigger every day. Building a Big Data infrastructure today is no easy task. There are an enormous number of choices for database engines and technologies. To make things even more challenging, requirements are getting more sophisticated, and the standard paradigm of supporting historical analytics queries is often just one facet of what is needed. As Big Data growth continues, organizations are demanding real-time access to data, allowing immed...
The OpenStack cloud operating system includes Trove, a database abstraction layer. Rather than applications connecting directly to a specific type of database, they connect to Trove, which in turn connects to one or more specific databases. One target database is Postgres Plus Cloud Database, which includes its own RESTful API. Trove was originally developed around MySQL, whose interfaces are significantly less complicated than those of the Postgres cloud database. In his session at 16th Cloud...
SYS-CON Events announced today that EnterpriseDB (EDB), the leading worldwide provider of enterprise-class Postgres products and database compatibility solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. EDB is the largest provider of Postgres software and services that provides enterprise-class performance and scalability and the open source freedom to divert budget from more costly traditiona...
Data-intensive companies that strive to gain insights from data using Big Data analytics tools can gain tremendous competitive advantage by deploying data-centric storage. Organizations generate large volumes of data, the vast majority of which is unstructured. As the volume and velocity of this unstructured data increases, the costs, risks and usability challenges associated with managing the unstructured data (regardless of file type, size or device) increases simultaneously, including end-to-...
SYS-CON Events announced today that the "First Containers & Microservices Conference" will take place June 9-11, 2015, at the Javits Center in New York City. The “Second Containers & Microservices Conference” will take place November 3-5, 2015, at Santa Clara Convention Center, Santa Clara, CA. Containers and microservices have become topics of intense interest throughout the cloud developer and enterprise IT communities.
Can the spatial component of your Big Data be harnessed and visualized, adding another dimension of power and analytics to your data? In his session at Big Data Expo®, John Meza, Product Engineer and Performance Engineering Team Lead at Esri, discussed the spatial queries that can be used within the Hadoop ecosystem and their integration with GeoSpatial applications. The GIS Tools for Hadoop project was also discussed and its implementation to discover location-based patterns and relationships...
There is little doubt that Big Data solutions will have an increasing role in the Enterprise IT mainstream over time. 8th International Big Data Expo, co-located with 17th International Cloud Expo - to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA - has announced its Call for Papers is open. As advanced data storage, access and analytics technologies aimed at handling high-volume and/or fast moving data all move center stage, aided by the cloud computing bo...
"People are a lot more knowledgeable about APIs now. There are two types of people who work with APIs - IT people who want to use APIs for something internal and the product managers who want to do something outside APIs for people to connect to them," explained Roberto Medrano, Executive Vice President at SOA Software, in this SYS-CON.tv interview at Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.