|By Srinivasan Sundara Rajan||
|January 30, 2013 07:00 AM EST||
Big Data & Text Analytics: As the analysis of large amounts of unstructured data is gaining a major space in enterprise computing, we are seeing the emergence of more use cases in this regard. While the term "Big" in Big Data makes it more synonymous with Massively Parallel Processing frameworks like Hadoop, however the underlying the success of Big Data relies on effective usage of content analytics of the underlying unstructured data. I have high lighted this thought process in my earlier article, Big Data Analytics Thinking Outside Of Hadoop.
Unstructured Content Analytics is defined as the process of gaining new insights from the unstructured data, by employing text mining, image recognition, voice recognition and other related analytical techniques.
Big Data Journal was launched on SYS-CON.com in 2012
The below material explains one such use case of Big Data & Text Analytics in getting meaningful insights from the Financial Reports.
Financial Reports & Analytics: All the publicly traded companies in USA & else where mandatorily disclose their corporate information to their shareholders. These annual financial statements are available as downloadable reports on the corporate websites of public companies. Apart from the annual report , there are other forms of financial statements like, investor news letters, Quarterly earning presentation, conference calls by CFO and other investor relationship documents form part of an organization's financial standing in the eyes of the investor.
Most of the investors and investment analyst firms currently uses their specialized knowledge to understand these financial statements and create meaningful insights out of them. However these analytics are mostly limited to the structured portions of the financial statements and not so much on the unstructured side of it.
To explain this more :
- For example An annual report may contain statements like Balance Sheet, Income, Equity, Cash Flows etc.. these statements are highly structured and organized as per accounting principles so that any of the qualified financial analysts can understand them
- At the same a typical financial statement also contains lot of unstructured information about growth strategies of the organization, road map, optimism, future vision, how the business model is aligned to the changing times etc...
So an effective analysis of a financial statement not only pertains to the structured information but also to the unstructured data available in the financial statements.
BigData, UIMA & Financial Report Analytics: The following Big Data aligned technologies can be effectively used in analysing the financial reports to derive meaningful insights into the large volumes of unstructured data.
- UIMA : UIMA stands for Unstructured Information Management Architecture is the major industry standard for content analytics.
- Annotators : UIMATM Annotators do the real work of extracting structured information from unstructured data. You can write your own annotators. Though Annotators form part of UIMA framework lot of custom development is written is creating Annotators specific to the needs of the Finance industry. When documents are processed through the document processing pipeline, the annotators extract concepts, words, phrases, classifications, and named entities from unstructured content and mark these extractions as annotations. The annotations are added to the index as tokens or facets and are used as the source for content analysis.
- Taxonomies : Taxonomies play a major role in identifying the topics of interest within a document using UIMA. In UIMA a type system defines the various types of objects that may be discovered in the document. Types in a UIMA type system may be organized into a taxonomy. For example, Company may be defined as a subtype of Organization
Realizing Financial Statement Analytics & Role of XBRL: There are not very many UIMA annotators and implementation of text extraction specific to financial statements. However we find that, under APACHE UIMA community there is one such annotator, The AlchemyAPI Annotator is a set of annotators that wrap the AlchemyAPI.
AlchemyAPI's (http://www.alchemyapi.com/api/) Categorization service can be used to categorize text, HTML, or web-based content, assigning the most likely topic category (news, sports, business, etc.). The business categories include topics like, Business and Finance News, SEC filings, etc.
There are several of the text analytics concepts like the below, can be applied on the financial statements
- Named Entity Extraction : Identify people, companies, organizations, cities, geographic features, and other typed entities within HTML pages and text documents/content.
- Concept Tagging : Automatically tag documents and text in a manner similar to human-based tagging.
- Keyword / Term Extraction : Extract important terms and "topic" keywords from HTML pages and text documents/content. Advanced statistical and linguistic algorithms analyze your content, "tagging" it with the most important words and phrases.
- Sentiment Analysis : Identify positive, negative and neutral sentiment within HTML pages and text documents/content.
- Relation Extraction : Identify facts and Subject-Action-Object relations within HTML pages and text documents/content.
Apart from the already developed and community supported annotators, we could develop new annotators which can take the best use of already established taxonomies for the financial industry in the form of XBRL.
XBRL stands for eXtensible Business Reporting Language. It is a language for the electronic communication of business information, providing major benefits in the preparation, analysis and communication of business information. It is one of a family of "XML" languages which is a standard means of communicating information between businesses and on the internet.
XBRL Taxonomies, are the dictionaries which the language uses. These are the categorization schemes which define the specific tags for individual items of data (such as "net profit"). National jurisdictions have different accounting regulations, so each may have its own. There are already well established approved taxonomies for financial reporting like XBRL US GAAP as listed in the site, http://www.xbrl.org/FRTApproved.
As evident from the architecture of UIMA and annotator entity extraction process, these established taxonomies can play a major role in areas like concept tagging, which can help in getting the meaningful insights from large amounts of textual and other unstructured content in the financial statements.
Summary: As enterprises and analytics vendors adopt Big Data as part of the mainstream , this adoption will be more meaningful to enable the technology to support new business use cases. Financial Analytics is one such important area , and with the support of frameworks like UIMA coupled with industry established taxonomies, such analytics are quite possible and worth to be implemented.
"We've just seen a huge influx of new partners coming into our ecosystem, and partners building unique offerings on top of our API set," explained Seth Bostock, Chief Executive Officer at IndependenceIT, in this SYS-CON.tv interview at 16th Cloud Expo, held June 9-11, 2015, at the Javits Center in New York City.
Aug. 3, 2015 11:00 PM EDT Reads: 696
SYS-CON Events announced today that HPM Networks will exhibit at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. For 20 years, HPM Networks has been integrating technology solutions that solve complex business challenges. HPM Networks has designed solutions for both SMB and enterprise customers throughout the San Francisco Bay Area.
Aug. 3, 2015 06:45 PM EDT Reads: 538
Learn how you can use the CoSN SEND II Decision Tree for Education Technology to make sure that your K–12 technology initiatives create a more engaging learning experience that empowers students, teachers, and administrators alike.
Aug. 3, 2015 05:45 PM EDT
As organizations shift towards IT-as-a-service models, the need for managing and protecting data residing across physical, virtual, and now cloud environments grows with it. CommVault can ensure protection and E-Discovery of your data – whether in a private cloud, a Service Provider delivered public cloud, or a hybrid cloud environment – across the heterogeneous enterprise. In his session at 17th Cloud Expo, Randy De Meno, Chief Technologist - Windows Products and Microsoft Partnerships at Com...
Aug. 3, 2015 03:30 PM EDT
SYS-CON Events announced today that VividCortex, the monitoring solution for the modern data system, will exhibit at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. The database is the heart of most applications, but it’s also the part that’s hardest to scale, monitor, and optimize even as it’s growing 50% year over year. VividCortex is the first unified suite of database monitoring tools specifically desi...
Aug. 3, 2015 03:15 PM EDT
There are many considerations when moving applications from on-premise to cloud. It is critical to understand the benefits and also challenges of this migration. A successful migration will result in lower Total Cost of Ownership, yet offer the same or higher level of robustness. In his session at 15th Cloud Expo, Michael Meiner, an Engineering Director at Oracle, Corporation, analyzed a range of cloud offerings (IaaS, PaaS, SaaS) and discussed the benefits/challenges of migrating to each offe...
Aug. 3, 2015 07:30 AM EDT Reads: 196
With SaaS use rampant across organizations, how can IT departments track company data and maintain security? More and more departments are commissioning their own solutions and bypassing IT. A cloud environment is amorphous and powerful, allowing you to set up solutions for all of your user needs: document sharing and collaboration, mobile access, e-mail, even industry-specific applications. In his session at 16th Cloud Expo, Shawn Mills, President and a founder of Green House Data, discussed h...
Aug. 2, 2015 11:45 AM EDT Reads: 490
For IoT to grow as quickly as analyst firms’ project, a lot is going to fall on developers to quickly bring applications to market. But the lack of a standard development platform threatens to slow growth and make application development more time consuming and costly, much like we’ve seen in the mobile space. In his session at @ThingsExpo, Mike Weiner, Product Manager of the Omega DevCloud with KORE Telematics Inc., discussed the evolving requirements for developers as IoT matures and conducte...
Aug. 2, 2015 11:15 AM EDT Reads: 370
One of the hottest areas in cloud right now is DRaaS and related offerings. In his session at 16th Cloud Expo, Dale Levesque, Disaster Recovery Product Manager with Windstream's Cloud and Data Center Marketing team, will discuss the benefits of the cloud model, which far outweigh the traditional approach, and how enterprises need to ensure that their needs are properly being met.
Aug. 2, 2015 09:00 AM EDT Reads: 1,712
Malicious agents are moving faster than the speed of business. Even more worrisome, most companies are relying on legacy approaches to security that are no longer capable of meeting current threats. In the modern cloud, threat diversity is rapidly expanding, necessitating more sophisticated security protocols than those used in the past or in desktop environments. Yet companies are falling for cloud security myths that were truths at one time but have evolved out of existence.
Jul. 30, 2015 06:00 PM EDT Reads: 1,835
[slides] Workloads and Public Cloud at @CloudExpo By @utollwi | @ProfitBricksUSA #DevOps #Containers #Microservices
Public Cloud IaaS started its life in the developer and startup communities and has grown rapidly to a $20B+ industry, but it still pales in comparison to how much is spent worldwide on IT: $3.6 trillion. In fact, there are 8.6 million data centers worldwide, the reality is many small and medium sized business have server closets and colocation footprints filled with servers and storage gear. While on-premise environment virtualization may have peaked at 75%, the Public Cloud has lagged in adop...
Jul. 30, 2015 04:00 PM EDT Reads: 2,249
The Cloud industry has moved from being more than just being able to provide infrastructure and management services on the Cloud. Enter a new era of Cloud computing where monetization’s services through the Cloud are an essential piece of strategy to feed your organizations bottom-line, your revenue and Profitability. In their session at 16th Cloud Expo, Ermanno Bonifazi, CEO & Founder of Solgenia, and Ian Khan, Global Strategic Positioning & Brand Manager at Solgenia, discussed how to easily o...
Jul. 30, 2015 01:45 PM EDT Reads: 409
Growth hacking is common for startups to make unheard-of progress in building their business. Career Hacks can help Geek Girls and those who support them (yes, that's you too, Dad!) to excel in this typically male-dominated world. Get ready to learn the facts: Is there a bias against women in the tech / developer communities? Why are women 50% of the workforce, but hold only 24% of the STEM or IT positions? Some beginnings of what to do about it! In her Opening Keynote at 16th Cloud Expo, S...
Jul. 30, 2015 12:00 PM EDT Reads: 2,090
In his keynote at 16th Cloud Expo, Rodney Rogers, CEO of Virtustream, discussed the evolution of the company from inception to its recent acquisition by EMC – including personal insights, lessons learned (and some WTF moments) along the way. Learn how Virtustream’s unique approach of combining the economics and elasticity of the consumer cloud model with proper performance, application automation and security into a platform became a breakout success with enterprise customers and a natural fit f...
Jul. 30, 2015 09:00 AM EDT Reads: 2,182
"We have been in business for 21 years and have been building many enterprise solutions, all IT plumbing - server, storage, interconnects," stated Alex Gorbachev, President of Intelligent Systems Services, in this SYS-CON.tv interview at 16th Cloud Expo, held June 9-11, 2015, at the Javits Center in New York City.
Jul. 30, 2015 08:30 AM EDT Reads: 1,068
The essence of cloud computing is that all consumable IT resources are delivered as services. In his session at 15th Cloud Expo, Yung Chou, Technology Evangelist at Microsoft, demonstrated the concepts and implementations of two important cloud computing deliveries: Infrastructure as a Service (IaaS) and Platform as a Service (PaaS). He discussed from business and technical viewpoints what exactly they are, why we care, how they are different and in what ways, and the strategies for IT to tran...
Jul. 29, 2015 03:15 PM EDT Reads: 428
Discussions about cloud computing are evolving into discussions about enterprise IT in general. As enterprises increasingly migrate toward their own unique clouds, new issues such as the use of containers and microservices emerge to keep things interesting. In this Power Panel at 16th Cloud Expo, moderated by Conference Chair Roger Strukhoff, panelists addressed the state of cloud computing today, and what enterprise IT professionals need to know about how the latest topics and trends affect t...
Jul. 29, 2015 02:00 PM EDT Reads: 1,214
"Our biggest growth area has been the security services, the managed services - the things that differentiate us in the market that there is no client that's too small and there's no client that's too big," explained Paul Mazzucco, Chief Security Officer at TierPoint, in this SYS-CON.tv interview at 16th Cloud Expo, held June 9-11, 2015, at the Javits Center in New York City.
Jul. 29, 2015 02:00 PM EDT Reads: 445
"We do data integration for B2B also application to application, and we do data management and enable Big Data," explained Pat Adamiak, Vice President, Product Marketing at Liaison Technologies, in this SYS-CON.tv interview at 16th Cloud Expo, held June 9-11, 2015, at the Javits Center in New York City.
Jul. 29, 2015 12:00 PM EDT Reads: 371
[slides] From Industry to Society By @JMondanaro | @ThingsExpo @MetraTech @Ericsson #IoT #M2M #InternetOfThings
It is one thing to build single industrial IoT applications, but what will it take to build the Smart Cities and truly society-changing applications of the future? The technology won’t be the problem, it will be the number of parties that need to work together and be aligned in their motivation to succeed. In his session at @ThingsExpo, Jason Mondanaro, Director, Product Management at Metanga, discussed how you can plan to cooperate, partner, and form lasting all-star teams to change the world...
Jul. 28, 2015 04:30 PM EDT Reads: 1,779