|By Srinivasan Sundara Rajan||
|January 30, 2013 07:00 AM EST||
Big Data & Text Analytics: As the analysis of large amounts of unstructured data is gaining a major space in enterprise computing, we are seeing the emergence of more use cases in this regard. While the term "Big" in Big Data makes it more synonymous with Massively Parallel Processing frameworks like Hadoop, however the underlying the success of Big Data relies on effective usage of content analytics of the underlying unstructured data. I have high lighted this thought process in my earlier article, Big Data Analytics Thinking Outside Of Hadoop.
Unstructured Content Analytics is defined as the process of gaining new insights from the unstructured data, by employing text mining, image recognition, voice recognition and other related analytical techniques.
Big Data Journal was launched on SYS-CON.com in 2012
The below material explains one such use case of Big Data & Text Analytics in getting meaningful insights from the Financial Reports.
Financial Reports & Analytics: All the publicly traded companies in USA & else where mandatorily disclose their corporate information to their shareholders. These annual financial statements are available as downloadable reports on the corporate websites of public companies. Apart from the annual report , there are other forms of financial statements like, investor news letters, Quarterly earning presentation, conference calls by CFO and other investor relationship documents form part of an organization's financial standing in the eyes of the investor.
Most of the investors and investment analyst firms currently uses their specialized knowledge to understand these financial statements and create meaningful insights out of them. However these analytics are mostly limited to the structured portions of the financial statements and not so much on the unstructured side of it.
To explain this more :
- For example An annual report may contain statements like Balance Sheet, Income, Equity, Cash Flows etc.. these statements are highly structured and organized as per accounting principles so that any of the qualified financial analysts can understand them
- At the same a typical financial statement also contains lot of unstructured information about growth strategies of the organization, road map, optimism, future vision, how the business model is aligned to the changing times etc...
So an effective analysis of a financial statement not only pertains to the structured information but also to the unstructured data available in the financial statements.
BigData, UIMA & Financial Report Analytics: The following Big Data aligned technologies can be effectively used in analysing the financial reports to derive meaningful insights into the large volumes of unstructured data.
- UIMA : UIMA stands for Unstructured Information Management Architecture is the major industry standard for content analytics.
- Annotators : UIMATM Annotators do the real work of extracting structured information from unstructured data. You can write your own annotators. Though Annotators form part of UIMA framework lot of custom development is written is creating Annotators specific to the needs of the Finance industry. When documents are processed through the document processing pipeline, the annotators extract concepts, words, phrases, classifications, and named entities from unstructured content and mark these extractions as annotations. The annotations are added to the index as tokens or facets and are used as the source for content analysis.
- Taxonomies : Taxonomies play a major role in identifying the topics of interest within a document using UIMA. In UIMA a type system defines the various types of objects that may be discovered in the document. Types in a UIMA type system may be organized into a taxonomy. For example, Company may be defined as a subtype of Organization
Realizing Financial Statement Analytics & Role of XBRL: There are not very many UIMA annotators and implementation of text extraction specific to financial statements. However we find that, under APACHE UIMA community there is one such annotator, The AlchemyAPI Annotator is a set of annotators that wrap the AlchemyAPI.
AlchemyAPI's (http://www.alchemyapi.com/api/) Categorization service can be used to categorize text, HTML, or web-based content, assigning the most likely topic category (news, sports, business, etc.). The business categories include topics like, Business and Finance News, SEC filings, etc.
There are several of the text analytics concepts like the below, can be applied on the financial statements
- Named Entity Extraction : Identify people, companies, organizations, cities, geographic features, and other typed entities within HTML pages and text documents/content.
- Concept Tagging : Automatically tag documents and text in a manner similar to human-based tagging.
- Keyword / Term Extraction : Extract important terms and "topic" keywords from HTML pages and text documents/content. Advanced statistical and linguistic algorithms analyze your content, "tagging" it with the most important words and phrases.
- Sentiment Analysis : Identify positive, negative and neutral sentiment within HTML pages and text documents/content.
- Relation Extraction : Identify facts and Subject-Action-Object relations within HTML pages and text documents/content.
Apart from the already developed and community supported annotators, we could develop new annotators which can take the best use of already established taxonomies for the financial industry in the form of XBRL.
XBRL stands for eXtensible Business Reporting Language. It is a language for the electronic communication of business information, providing major benefits in the preparation, analysis and communication of business information. It is one of a family of "XML" languages which is a standard means of communicating information between businesses and on the internet.
XBRL Taxonomies, are the dictionaries which the language uses. These are the categorization schemes which define the specific tags for individual items of data (such as "net profit"). National jurisdictions have different accounting regulations, so each may have its own. There are already well established approved taxonomies for financial reporting like XBRL US GAAP as listed in the site, http://www.xbrl.org/FRTApproved.
As evident from the architecture of UIMA and annotator entity extraction process, these established taxonomies can play a major role in areas like concept tagging, which can help in getting the meaningful insights from large amounts of textual and other unstructured content in the financial statements.
Summary: As enterprises and analytics vendors adopt Big Data as part of the mainstream , this adoption will be more meaningful to enable the technology to support new business use cases. Financial Analytics is one such important area , and with the support of frameworks like UIMA coupled with industry established taxonomies, such analytics are quite possible and worth to be implemented.
The Industrial Internet revolution is now underway, enabled by connected machines and billions of devices that communicate and collaborate. The massive amounts of Big Data requiring real-time analysis is flooding legacy IT systems and giving way to cloud environments that can handle the unpredictable workloads. Yet many barriers remain until we can fully realize the opportunities and benefits from the convergence of machines and devices with Big Data and the cloud, including interoperability, ...
Jan. 26, 2015 07:45 PM EST Reads: 2,506
Companies today struggle to manage the types and volume of data their customers and employees generate and use every day. With billions of requests daily, operational consistency can be elusive. In his session at Big Data Expo, Dave McCrory, CTO at Basho Technologies, will explore how a distributed systems solution, such as NoSQL, can give organizations the consistency and availability necessary to succeed with on-demand data, offering high availability at massive scale.
Jan. 26, 2015 07:30 PM EST Reads: 1,521
SYS-CON Events announced today that CodeFutures, a leading supplier of database performance tools, has been named a “Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. CodeFutures is an independent software vendor focused on providing tools that deliver database performance tools that increase productivity during database development and increase database performance and scalability during production.
Jan. 26, 2015 06:00 PM EST Reads: 1,680
Midway through the decade, the experts from Veeva Systems – a leader in cloud-based software for the global life sciences industry – look at what’s on the horizon over the next five years. Their forecasts are informed by a vision for what’s next in technology and insight gleaned from Veeva’s 200+ life sciences customers worldwide. Overall, these predictions reflect how new innovations will enable faster time to market, evolving commercial models, and new ways to support physicians and patients. ...
Jan. 26, 2015 06:00 PM EST Reads: 352
Dale Kim is the Director of Industry Solutions at MapR. His background includes a variety of technical and management roles at information technology companies. While his experience includes work with relational databases, much of his career pertains to non-relational data in the areas of search, content management, and NoSQL, and includes senior roles in technical marketing, sales engineering, and support engineering. Dale holds an MBA from Santa Clara University, and a BA in Computer Science f...
Jan. 26, 2015 06:00 PM EST Reads: 3,075
The Internet of Things (IoT) is rapidly in the process of breaking from its heretofore relatively obscure enterprise applications (such as plant floor control and supply chain management) and going mainstream into the consumer space. More and more creative folks are interconnecting everyday products such as household items, mobile devices, appliances and cars, and unleashing new and imaginative scenarios. We are seeing a lot of excitement around applications in home automation, personal fitness,...
Jan. 26, 2015 06:00 PM EST Reads: 2,811
The Internet of Things (IoT) promises to evolve the way the world does business; however, understanding how to apply it to your company can be a mystery. Most people struggle with understanding the potential business uses or tend to get caught up in the technology, resulting in solutions that fail to meet even minimum business goals. In his session at @ThingsExpo, Jesse Shiah, CEO / President / Co-Founder of AgilePoint Inc., showed what is needed to leverage the IoT to transform your business. ...
Jan. 26, 2015 05:45 PM EST Reads: 3,117
The 3rd International Internet of @ThingsExpo, co-located with the 16th International Cloud Expo - to be held June 9-11, 2015, at the Javits Center in New York City, NY - announces that its Call for Papers is now open. The Internet of Things (IoT) is the biggest idea since the creation of the Worldwide Web more than 20 years ago.
Jan. 26, 2015 05:00 PM EST Reads: 7,611 Replies: 1
Things are being built upon cloud foundations to transform organizations. This CEO Power Panel at 15th Cloud Expo, moderated by Roger Strukhoff, Cloud Expo and @ThingsExpo conference chair, addressed the big issues involving these technologies and, more important, the results they will achieve. Rodney Rogers, chairman and CEO of Virtustream; Brendan O'Brien, co-founder of Aria Systems, Bart Copeland, president and CEO of ActiveState Software; Jim Cowie, chief scientist at Dyn; Dave Wagstaff, VP ...
Jan. 26, 2015 05:00 PM EST Reads: 2,525
Vormetric on Wednesday announced the results of its 2015 Insider Threat Report (ITR), conducted online on their behalf by Harris Poll and in conjunction with analyst firm Ovum in fall 2014 among 818 IT decision makers in various countries, including 408 in the United States. The report details striking findings around how U.S. and international enterprises perceive security threats, the types of employees considered most dangerous, environments at the greatest risk for data loss and the steps or...
Jan. 26, 2015 05:00 PM EST Reads: 1,532
Storage administrators find themselves walking a line between meeting employees’ demands to use public cloud storage services, and their organizations’ need to store information on-premises for security, performance, cost and compliance reasons. However, as file sharing protocols like CIFS and NFS continue to lose their relevance, simply relying only on a NAS-based environment creates inefficiencies that hurt productivity and the bottom line. IT wants to implement cloud storage it can purchase a...
Jan. 26, 2015 04:45 PM EST Reads: 714
“We help people build clusters, in the classical sense of the cluster. We help people put a full stack on top of every single one of those machines. We do the full bare metal install," explained Greg Bruno, Vice President of Engineering and co-founder of StackIQ, in this SYS-CON.tv interview at 15th Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Jan. 26, 2015 02:45 PM EST Reads: 2,250
"People are a lot more knowledgeable about APIs now. There are two types of people who work with APIs - IT people who want to use APIs for something internal and the product managers who want to do something outside APIs for people to connect to them," explained Roberto Medrano, Executive Vice President at SOA Software, in this SYS-CON.tv interview at Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Jan. 26, 2015 02:30 PM EST Reads: 2,360
Software AG and Wipro Ltd. have announced a joint solution platform for streaming analytics that provides real-time actionable intelligence for the Internet of Things (IoT) market. “The key to successfully addressing the IoT market is the ability to rapidly build and evolve apps that tap into, analyze and make smart decisions on fast, big data”, said John Bates, Global Head of Industry Solutions and CMO, Software AG. To address the huge market potential created by streaming analytics in conj...
Jan. 26, 2015 02:30 PM EST Reads: 614
Performance is the intersection of power, agility, control, and choice. If you value performance, and more specifically consistent performance, you need to look beyond simple virtualized compute. Many factors need to be considered to create a truly performant environment. In his General Session at 15th Cloud Expo, Harold Hannon, Sr. Software Architect at SoftLayer, discussed how to take advantage of a multitude of compute options and platform features to make cloud the cornerstone of your onlin...
Jan. 26, 2015 02:15 PM EST Reads: 2,934
SYS-CON Media announced that Splunk, a provider of the leading software platform for real-time Operational Intelligence, has launched an ad campaign on Big Data Journal. Splunk software and cloud services enable organizations to search, monitor, analyze and visualize machine-generated big data coming from websites, applications, servers, networks, sensors and mobile devices. The ads focus on delivering ROI - how improved uptime delivered $6M in annual ROI, improving customer operations by minin...
Jan. 26, 2015 02:00 PM EST Reads: 3,534
Software Defined Storage provides many benefits for customers including agility, flexibility, faster adoption of new technology and cost effectiveness. However, for IT organizations it can be challenging and complex to build your Enterprise Grade Storage from software. In his session at Cloud Expo, Paul Turner, CMO at Cloudian, looked at the new Original Design Manufacturer (ODM) market and how it is changing the storage world. Now Software Defined Storage companies can build Enterprise grade ...
Jan. 26, 2015 02:00 PM EST Reads: 2,133
Hardware will never be more valuable than on the day it hits your loading dock. Each day new servers are not deployed to production the business is losing money. While Moore's Law is typically cited to explain the exponential density growth of chips, a critical consequence of this is rapid depreciation of servers. The hardware for clustered systems (e.g., Hadoop, OpenStack) tends to be significant capital expenses. In his session at Big Data Expo, Mason Katz, CTO and co-founder of StackIQ, disc...
Jan. 26, 2015 02:00 PM EST Reads: 2,937
In this Women in Technology Power Panel at 15th Cloud Expo, moderated by Anne Plese, Senior Consultant, Cloud Product Marketing at Verizon Enterprise, Esmeralda Swartz, CMO at MetraTech; Evelyn de Souza, Data Privacy and Compliance Strategy Leader at Cisco Systems; Seema Jethani, Director of Product Management at Basho Technologies; Victoria Livschitz, CEO of Qubell Inc.; Anne Hungate, Senior Director of Software Quality at DIRECTV, discussed what path they took to find their spot within the tec...
Jan. 26, 2015 01:45 PM EST Reads: 2,119
DevOps Summit 2015 New York, co-located with the 16th International Cloud Expo - to be held June 9-11, 2015, at the Javits Center in New York City, NY - announces that it is now accepting Keynote Proposals. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce software that is obsolete...
Jan. 26, 2015 01:15 PM EST Reads: 2,284