Welcome!

Big Data Journal Authors: Pat Romanski, Elizabeth White, Liz McMillan, Roger Strukhoff, Dana Gardner

Related Topics: Big Data Journal

Big Data Journal: Blog Post

Big Data in Financial Analytics

Big Data and UIMA use cases

Big Data & Text Analytics: As  the analysis  of  large amounts  of unstructured  data is gaining a major space in enterprise  computing,  we  are seeing the emergence of more use cases in this regard.  While  the  term   "Big"  in Big Data   makes it more synonymous  with  Massively Parallel Processing frameworks like Hadoop,  however  the  underlying the success of  Big Data  relies  on effective usage of  content analytics  of the underlying  unstructured data.  I have high lighted  this thought process in my earlier  article, Big Data Analytics Thinking Outside Of Hadoop.

Unstructured Content Analytics is  defined  as the   process of  gaining  new insights  from  the  unstructured data, by  employing   text mining, image recognition, voice recognition and other related analytical techniques.


Big Data Journal was launched on SYS-CON.com in 2012

The below  material  explains   one such use case of  Big Data &  Text Analytics in  getting meaningful insights  from the Financial  Reports.

Financial Reports  & Analytics: All the  publicly  traded  companies in USA & else where  mandatorily  disclose  their corporate information to their  shareholders.  These annual financial statements   are available  as  downloadable reports  on the corporate websites  of  public  companies.   Apart  from the  annual report , there are other forms  of financial statements  like,  investor news letters, Quarterly earning presentation, conference calls by CFO  and other investor relationship documents form part of  an  organization's  financial standing in the eyes  of the  investor.

Most  of the  investors  and  investment analyst  firms  currently  uses  their specialized   knowledge to understand  these financial  statements  and  create meaningful  insights  out of them.  However  these analytics  are mostly limited to the structured  portions  of  the financial statements and not so much  on the unstructured  side of it.

To explain this more :

  • For example An annual report may contain statements like Balance Sheet, Income, Equity, Cash Flows etc.. these statements are highly structured and organized as per accounting principles so that any of the qualified financial analysts can understand them
  • At the same a typical financial statement also contains lot of unstructured information about growth strategies of the organization, road map, optimism, future vision, how the business model is aligned to the changing times etc...

So   an effective  analysis of  a financial statement  not only pertains  to the structured information but also to the unstructured  data available in the  financial statements.

BigData, UIMA  & Financial Report Analytics: The following   Big Data aligned  technologies  can be effectively used  in analysing the  financial  reports  to derive meaningful insights into the  large volumes  of unstructured data.

  • UIMA : UIMA stands for Unstructured Information Management Architecture is the major industry standard for content analytics.

 

  • Annotators : UIMATM Annotators do the real work of extracting structured information from unstructured data. You can write your own annotators. Though Annotators form part of UIMA framework lot of custom development is written is creating Annotators specific to the needs of the Finance industry. When documents are processed through the document processing pipeline, the annotators extract concepts, words, phrases, classifications, and named entities from unstructured content and mark these extractions as annotations. The annotations are added to the index as tokens or facets and are used as the source for content analysis.

  • Taxonomies : Taxonomies play a major role in identifying the topics of interest within a document using UIMA. In UIMA a type system defines the various types of objects that may be discovered in the document. Types in a UIMA type system may be organized into a taxonomy. For example, Company may be defined as a subtype of Organization

 

Realizing Financial Statement Analytics & Role of  XBRL: There  are not very many  UIMA  annotators  and  implementation of   text extraction specific to financial statements.  However  we find that,  under  APACHE UIMA community  there is  one such annotator,   The AlchemyAPI Annotator is a set of annotators that wrap the AlchemyAPI.

AlchemyAPI's  (http://www.alchemyapi.com/api/)  Categorization service can be used to categorize text, HTML, or web-based content, assigning the most likely topic category (news, sports, business, etc.).  The business categories  include  topics like, Business and Finance News, SEC filings, etc.

There  are  several  of  the   text analytics concepts  like  the below,  can be applied on the financial statements

  • Named Entity Extraction : Identify people, companies, organizations, cities, geographic features, and other typed entities within HTML pages and text documents/content.
  • Concept Tagging : Automatically tag documents and text in a manner similar to human-based tagging.
  • Keyword / Term Extraction : Extract important terms and "topic" keywords from HTML pages and text documents/content. Advanced statistical and linguistic algorithms analyze your content, "tagging" it with the most important words and phrases.
  • Sentiment Analysis : Identify positive, negative and neutral sentiment within HTML pages and text documents/content.
  • Relation Extraction : Identify facts and Subject-Action-Object relations within HTML pages and text documents/content.

Apart  from  the  already  developed  and community  supported  annotators,  we could   develop  new annotators  which  can take the best use of already  established  taxonomies  for the financial industry   in the form of  XBRL.

XBRL stands for eXtensible Business Reporting Language. It is a language for the electronic communication of business information, providing major benefits in the preparation, analysis and communication of business information. It is one of a family of "XML" languages which is a standard means of communicating information between businesses and on the internet.

XBRL Taxonomies,  are the dictionaries which the language uses. These are the categorization schemes which define the specific tags for individual items of data (such as "net profit").  National jurisdictions have different accounting regulations, so each may have its own.  There are already well established  approved taxonomies  for  financial reporting  like  XBRL  US  GAAP  as listed in the  site, http://www.xbrl.org/FRTApproved.

As  evident  from  the  architecture  of UIMA  and annotator  entity extraction process, these established  taxonomies  can play a major role in areas like concept tagging,  which  can help in  getting the  meaningful insights  from    large  amounts of  textual and other  unstructured content in the financial statements.

Summary: As  enterprises  and analytics vendors  adopt  Big Data  as part of the mainstream ,  this  adoption will be  more meaningful  to  enable   the technology  to support new  business use cases.  Financial  Analytics  is  one such important area  ,  and with the support of    frameworks like UIMA  coupled  with  industry established taxonomies,  such  analytics  are quite possible  and worth to be implemented.

More Stories By Srinivasan Sundara Rajan

I am passionate about ownership and driving things on my own, with my breadth and depth on Enterprise Technology I could run any aspect of IT Industry and make it a success. I am a Seasoned Enterprise IT Expert, mainly in the areas of Solution,Integration and Architecture, across Structured, Unstructured data sources, especially in manufacturing domain. My recent work is on Natural Language Processing, Semantic Enrichment of Unstructured Data, Data Mining and Predictive Analytics. However I have a strong footing across all tiers of Enterprise IT spectrum. I am geared to handle the massive flow of data by Internet Of Things with appropriate platform, tools and processes.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@BigDataExpo Stories
"SAP had made a big transition into the cloud as we believe it has significant value for our customers, drives innovation and is easy to consume. When you look at the SAP portfolio, SAP HANA is the underlying platform and it powers all of our platforms and all of our analytics," explained Thorsten Leiduck, VP ISVs & Digital Commerce at SAP, in this SYS-CON.tv interview at 15th Cloud Expo, held Nov 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
SAP is delivering break-through innovation combined with fantastic user experience powered by the market-leading in-memory technology, SAP HANA. In his General Session at 15th Cloud Expo, Thorsten Leiduck, VP ISVs & Digital Commerce, SAP, discussed how SAP and partners provide cloud and hybrid cloud solutions as well as real-time Big Data offerings that help companies of all sizes and industries run better. SAP launched an application challenge to award the most innovative SAP HANA and SAP HANA...
Enthusiasm for the Internet of Things has reached an all-time high. In 2013 alone, venture capitalists spent more than $1 billion dollars investing in the IoT space. With "smart" appliances and devices, IoT covers wearable smart devices, cloud services to hardware companies. Nest, a Google company, detects temperatures inside homes and automatically adjusts it by tracking its user's habit. These technologies are quickly developing and with it come challenges such as bridging infrastructure gaps,...
"Cloud consumption is something we envision at Solgenia. That is trying to let the cloud spread to the user as a consumption, as utility computing. We want to allow the people to just pay for what they use, not a subscription model," explained Ermanno Bonifazi, CEO & Founder of Solgenia, in this SYS-CON.tv interview at Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
How do APIs and IoT relate? The answer is not as simple as merely adding an API on top of a dumb device, but rather about understanding the architectural patterns for implementing an IoT fabric. There are typically two or three trends: Exposing the device to a management framework Exposing that management framework to a business centric logic Exposing that business layer and data to end users. This last trend is the IoT stack, which involves a new shift in the separation of what stuff happe...
Cultural, regulatory, environmental, political and economic (CREPE) conditions over the past decade are creating cross-industry solution spaces that require processes and technologies from both the Internet of Things (IoT), and Data Management and Analytics (DMA). These solution spaces are evolving into Sensor Analytics Ecosystems (SAE) that represent significant new opportunities for organizations of all types. Public Utilities throughout the world, providing electricity, natural gas and water,...
The Internet of Things will greatly expand the opportunities for data collection and new business models driven off of that data. In her session at @ThingsExpo, Esmeralda Swartz, CMO of MetraTech, discussed how for this to be effective you not only need to have infrastructure and operational models capable of utilizing this new phenomenon, but increasingly service providers will need to convince a skeptical public to participate. Get ready to show them the money!
The Internet of Things is tied together with a thin strand that is known as time. Coincidentally, at the core of nearly all data analytics is a timestamp. When working with time series data there are a few core principles that everyone should consider, especially across datasets where time is the common boundary. In his session at Internet of @ThingsExpo, Jim Scott, Director of Enterprise Strategy & Architecture at MapR Technologies, discussed single-value, geo-spatial, and log time series dat...
Explosive growth in connected devices. Enormous amounts of data for collection and analysis. Critical use of data for split-second decision making and actionable information. All three are factors in making the Internet of Things a reality. Yet, any one factor would have an IT organization pondering its infrastructure strategy. How should your organization enhance its IT framework to enable an Internet of Things implementation? In his session at Internet of @ThingsExpo, James Kirkland, Chief Ar...
SAP is delivering break-through innovation combined with fantastic user experience powered by the market-leading in-memory technology, SAP HANA. In his General Session at 15th Cloud Expo, Thorsten Leiduck, VP ISVs & Digital Commerce, SAP, discussed how SAP and partners provide cloud and hybrid cloud solutions as well as real-time Big Data offerings that help companies of all sizes and industries run better. SAP launched an application challenge to award the most innovative SAP HANA and SAP HANA...
Vichara Technologies in Hoboken, New Jersey is expanding its capabilities in big data from origins on Wall Street into other areas, and thereby demonstrating the growing marketplace for advanced big-data analytics services. The next BriefingsDirect deep-dive big data benefits case study interview explores how Vichara Technologies in Hoboken, New Jersey is expanding its capabilities in big data from origins on Wall Street into other areas, and thereby demonstrating the growing marketplace for ad...
The definition of IoT is not new, in fact it’s been around for over a decade. What has changed is the public's awareness that the technology we use on a daily basis has caught up on the vision of an always on, always connected world. If you look into the details of what comprises the IoT, you’ll see that it includes everything from cloud computing, Big Data analytics, “Things,” Web communication, applications, network, storage, etc. It is essentially including everything connected online from ha...
Scene scenario: 10 am in a boardroom somewhere, second round of coffees served, Danish and donuts untouched, a quiet hush settles. “Well you know what guys? (and, by the use of the term guys I mean to include both sexes here assembled) – the trouble that we have as a company is that we are, to put it bluntly, just a little analytics poor,” said the newly appointed Chief Analytics Officer. That we should consider a firm to be analytically deficient or poor is a profound comment on our modern ag...
Cloud Expo 2014 TV commercials will feature @ThingsExpo, which was launched in June, 2014 at New York City's Javits Center as the largest 'Internet of Things' event in the world.
SYS-CON Events announced today that Windstream, a leading provider of advanced network and cloud communications, has been named “Silver Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. Windstream (Nasdaq: WIN), a FORTUNE 500 and S&P 500 company, is a leading provider of advanced network communications, including cloud computing and managed services, to businesses nationwide. The company also offers broadband, p...
The 4th International DevOps Summit, co-located with16th International Cloud Expo – being held June 9-11, 2015, at the Javits Center in New York City, NY – announces that its Call for Papers is now open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's large...
The major cloud platforms defy a simple, side-by-side analysis. Each of the major IaaS public-cloud platforms offers their own unique strengths and functionality. Options for on-site private cloud are diverse as well, and must be designed and deployed while taking existing legacy architecture and infrastructure into account. Then the reality is that most enterprises are embarking on a hybrid cloud strategy and programs. In this Power Panel at 15th Cloud Expo (http://www.CloudComputingExpo.com...

ARMONK, N.Y., Nov. 20, 2014 /PRNewswire/ --  IBM (NYSE: IBM) today announced that it is bringing a greater level of control, security and flexibility to cloud-based application development and delivery with a single-tenant version of Bluemix, IBM's

An entirely new security model is needed for the Internet of Things, or is it? Can we save some old and tested controls for this new and different environment? In his session at @ThingsExpo, New York's at the Javits Center, Davi Ottenheimer, EMC Senior Director of Trust, reviewed hands-on lessons with IoT devices and reveal a new risk balance you might not expect. Davi Ottenheimer, EMC Senior Director of Trust, has more than nineteen years' experience managing global security operations and asse...
Technology is enabling a new approach to collecting and using data. This approach, commonly referred to as the "Internet of Things" (IoT), enables businesses to use real-time data from all sorts of things including machines, devices and sensors to make better decisions, improve customer service, and lower the risk in the creation of new revenue opportunities. In his General Session at Internet of @ThingsExpo, Dave Wagstaff, Vice President and Chief Architect at BSQUARE Corporation, discuss the ...