Welcome!

@BigDataExpo Authors: Yeshim Deniz, Elizabeth White, Christopher Harrold, William Schmarzo, Liz McMillan

Related Topics: @BigDataExpo

@BigDataExpo: Blog Post

Big Data in Financial Analytics

Big Data and UIMA use cases

Big Data & Text Analytics: As  the analysis  of  large amounts  of unstructured  data is gaining a major space in enterprise  computing,  we  are seeing the emergence of more use cases in this regard.  While  the  term   "Big"  in Big Data   makes it more synonymous  with  Massively Parallel Processing frameworks like Hadoop,  however  the  underlying the success of  Big Data  relies  on effective usage of  content analytics  of the underlying  unstructured data.  I have high lighted  this thought process in my earlier  article, Big Data Analytics Thinking Outside Of Hadoop.

Unstructured Content Analytics is  defined  as the   process of  gaining  new insights  from  the  unstructured data, by  employing   text mining, image recognition, voice recognition and other related analytical techniques.


Big Data Journal was launched on SYS-CON.com in 2012

The below  material  explains   one such use case of  Big Data &  Text Analytics in  getting meaningful insights  from the Financial  Reports.

Financial Reports  & Analytics: All the  publicly  traded  companies in USA & else where  mandatorily  disclose  their corporate information to their  shareholders.  These annual financial statements   are available  as  downloadable reports  on the corporate websites  of  public  companies.   Apart  from the  annual report , there are other forms  of financial statements  like,  investor news letters, Quarterly earning presentation, conference calls by CFO  and other investor relationship documents form part of  an  organization's  financial standing in the eyes  of the  investor.

Most  of the  investors  and  investment analyst  firms  currently  uses  their specialized   knowledge to understand  these financial  statements  and  create meaningful  insights  out of them.  However  these analytics  are mostly limited to the structured  portions  of  the financial statements and not so much  on the unstructured  side of it.

To explain this more :

  • For example An annual report may contain statements like Balance Sheet, Income, Equity, Cash Flows etc.. these statements are highly structured and organized as per accounting principles so that any of the qualified financial analysts can understand them
  • At the same a typical financial statement also contains lot of unstructured information about growth strategies of the organization, road map, optimism, future vision, how the business model is aligned to the changing times etc...

So   an effective  analysis of  a financial statement  not only pertains  to the structured information but also to the unstructured  data available in the  financial statements.

BigData, UIMA  & Financial Report Analytics: The following   Big Data aligned  technologies  can be effectively used  in analysing the  financial  reports  to derive meaningful insights into the  large volumes  of unstructured data.

  • UIMA : UIMA stands for Unstructured Information Management Architecture is the major industry standard for content analytics.

 

  • Annotators : UIMATM Annotators do the real work of extracting structured information from unstructured data. You can write your own annotators. Though Annotators form part of UIMA framework lot of custom development is written is creating Annotators specific to the needs of the Finance industry. When documents are processed through the document processing pipeline, the annotators extract concepts, words, phrases, classifications, and named entities from unstructured content and mark these extractions as annotations. The annotations are added to the index as tokens or facets and are used as the source for content analysis.

  • Taxonomies : Taxonomies play a major role in identifying the topics of interest within a document using UIMA. In UIMA a type system defines the various types of objects that may be discovered in the document. Types in a UIMA type system may be organized into a taxonomy. For example, Company may be defined as a subtype of Organization

 

Realizing Financial Statement Analytics & Role of  XBRL: There  are not very many  UIMA  annotators  and  implementation of   text extraction specific to financial statements.  However  we find that,  under  APACHE UIMA community  there is  one such annotator,   The AlchemyAPI Annotator is a set of annotators that wrap the AlchemyAPI.

AlchemyAPI's  (http://www.alchemyapi.com/api/)  Categorization service can be used to categorize text, HTML, or web-based content, assigning the most likely topic category (news, sports, business, etc.).  The business categories  include  topics like, Business and Finance News, SEC filings, etc.

There  are  several  of  the   text analytics concepts  like  the below,  can be applied on the financial statements

  • Named Entity Extraction : Identify people, companies, organizations, cities, geographic features, and other typed entities within HTML pages and text documents/content.
  • Concept Tagging : Automatically tag documents and text in a manner similar to human-based tagging.
  • Keyword / Term Extraction : Extract important terms and "topic" keywords from HTML pages and text documents/content. Advanced statistical and linguistic algorithms analyze your content, "tagging" it with the most important words and phrases.
  • Sentiment Analysis : Identify positive, negative and neutral sentiment within HTML pages and text documents/content.
  • Relation Extraction : Identify facts and Subject-Action-Object relations within HTML pages and text documents/content.

Apart  from  the  already  developed  and community  supported  annotators,  we could   develop  new annotators  which  can take the best use of already  established  taxonomies  for the financial industry   in the form of  XBRL.

XBRL stands for eXtensible Business Reporting Language. It is a language for the electronic communication of business information, providing major benefits in the preparation, analysis and communication of business information. It is one of a family of "XML" languages which is a standard means of communicating information between businesses and on the internet.

XBRL Taxonomies,  are the dictionaries which the language uses. These are the categorization schemes which define the specific tags for individual items of data (such as "net profit").  National jurisdictions have different accounting regulations, so each may have its own.  There are already well established  approved taxonomies  for  financial reporting  like  XBRL  US  GAAP  as listed in the  site, http://www.xbrl.org/FRTApproved.

As  evident  from  the  architecture  of UIMA  and annotator  entity extraction process, these established  taxonomies  can play a major role in areas like concept tagging,  which  can help in  getting the  meaningful insights  from    large  amounts of  textual and other  unstructured content in the financial statements.

Summary: As  enterprises  and analytics vendors  adopt  Big Data  as part of the mainstream ,  this  adoption will be  more meaningful  to  enable   the technology  to support new  business use cases.  Financial  Analytics  is  one such important area  ,  and with the support of    frameworks like UIMA  coupled  with  industry established taxonomies,  such  analytics  are quite possible  and worth to be implemented.

More Stories By Srinivasan Sundara Rajan

Highly passionate about utilizing Digital Technologies to enable next generation enterprise. Believes in enterprise transformation through the Natives (Cloud Native & Mobile Native).

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@BigDataExpo Stories
Have you ever noticed how some IT people seem to lead successful, rewarding, and satisfying lives and careers, while others struggle? IT author and speaker Don Crawley uncovered the five principles that successful IT people use to build satisfying lives and careers and he shares them in this fast-paced, thought-provoking webinar. You'll learn the importance of striking a balance with technical skills and people skills, challenge your pre-existing ideas about IT customer service, and gain new in...
SYS-CON Events announced today that Catchpoint Systems, Inc., a provider of innovative web and infrastructure monitoring solutions, has been named “Silver Sponsor” of SYS-CON's DevOps Summit at 18th Cloud Expo New York, which will take place June 7-9, 2016, at the Javits Center in New York City, NY. Catchpoint is a leading Digital Performance Analytics company that provides unparalleled insight into customer-critical services to help consistently deliver an amazing customer experience. Designed ...
Ayehu provides IT Process Automation & Orchestration solutions for IT and Security professionals to identify and resolve critical incidents and enable rapid containment, eradication, and recovery from cyber security breaches. Ayehu provides customers greater control over IT infrastructure through automation. Ayehu solutions have been deployed by major enterprises worldwide, and currently, support thousands of IT processes across the globe. The company has offices in New York, California, and Isr...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
Addteq is one of the top 10 Platinum Atlassian Experts who specialize in DevOps, custom and continuous integration, automation, plugin development, and consulting for midsize and global firms. Addteq firmly believes that automation is essential for successful software releases. Addteq centers its products and services around this fundamentally unique approach to delivering complete software release management solutions. With a combination of Addteq's services and our extensive list of partners,...
SYS-CON Events announced today that Outlyer, a monitoring service for DevOps and operations teams, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Outlyer is a monitoring service for DevOps and Operations teams running Cloud, SaaS, Microservices and IoT deployments. Designed for today's dynamic environments that need beyond cloud-scale monitoring, we make monitoring effortless so you...
Cloud Expo, Inc. has announced today that Andi Mann and Aruna Ravichandran have been named Co-Chairs of @DevOpsSummit at Cloud Expo 2017. The @DevOpsSummit at Cloud Expo New York will take place on June 6-8, 2017, at the Javits Center in New York City, New York, and @DevOpsSummit at Cloud Expo Silicon Valley will take place Oct. 31-Nov. 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From ...
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From ...
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 20th International Cloud Expo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buyers...
When it comes to cloud computing, the ability to turn massive amounts of compute cores on and off on demand sounds attractive to IT staff, who need to manage peaks and valleys in user activity. With cloud bursting, the majority of the data can stay on premises while tapping into compute from public cloud providers, reducing risk and minimizing need to move large files. In his session at 18th Cloud Expo, Scott Jeschonek, Director of Product Management at Avere Systems, discussed the IT and busine...
It is one thing to build single industrial IoT applications, but what will it take to build the Smart Cities and truly society changing applications of the future? The technology won’t be the problem, it will be the number of parties that need to work together and be aligned in their motivation to succeed. In his Day 2 Keynote at @ThingsExpo, Henrik Kenani Dahlgren, Portfolio Marketing Manager at Ericsson, discussed how to plan to cooperate, partner, and form lasting all-star teams to change the...
Unsecured IoT devices were used to launch crippling DDOS attacks in October 2016, targeting services such as Twitter, Spotify, and GitHub. Subsequent testimony to Congress about potential attacks on office buildings, schools, and hospitals raised the possibility for the IoT to harm and even kill people. What should be done? Does the government need to intervene? This panel at @ThingExpo New York brings together leading IoT and security experts to discuss this very serious topic.
In his session at 20th Cloud Expo, Chris Carter, CEO of Approyo, will discuss the basic set up and solution for an SAP solution in the cloud and what it means to the viability of your company. Chris Carter is CEO of Approyo. He works with business around the globe, to assist them in their journey to the usage of Big Data in the forms of Hadoop (Cloudera and Hortonwork's) and SAP HANA. At Approyo, we support firms who are looking for knowledge to grow through current business process, where even ...
“We're a global managed hosting provider. Our core customer set is a U.S.-based customer that is looking to go global,” explained Adam Rogers, Managing Director at ANEXIA, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
SYS-CON Events announced today that Enzu will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Enzu’s mission is to be the leading provider of enterprise cloud solutions worldwide. Enzu enables online businesses to use its IT infrastructure to their competitive ad...
"I think that everyone recognizes that for IoT to really realize its full potential and value that it is about creating ecosystems and marketplaces and that no single vendor is able to support what is required," explained Esmeralda Swartz, VP, Marketing Enterprise and Cloud at Ericsson, in this SYS-CON.tv interview at @ThingsExpo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Data is the fuel that drives the machine learning algorithmic engines and ultimately provides the business value. In his session at Cloud Expo, Ed Featherston, a director and senior enterprise architect at Collaborative Consulting, discussed the key considerations around quality, volume, timeliness, and pedigree that must be dealt with in order to properly fuel that engine.
SYS-CON Events announced today that Hitrons Solutions will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Hitrons Solutions Inc. is distributor in the North American market for unique products and services of small and medium-size businesses, including cloud services and solutions, SEO marketing platforms, and mobile applications.
SYS-CON Events announced today that Linux Academy, the foremost online Linux and cloud training platform and community, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Linux Academy was founded on the belief that providing high-quality, in-depth training should be available at an affordable price. Industry leaders in quality training, provided services, and student certification passes, its goal is to c...