|By Srinivasan Sundara Rajan||
|January 30, 2013 07:00 AM EST||
Big Data & Text Analytics: As the analysis of large amounts of unstructured data is gaining a major space in enterprise computing, we are seeing the emergence of more use cases in this regard. While the term "Big" in Big Data makes it more synonymous with Massively Parallel Processing frameworks like Hadoop, however the underlying the success of Big Data relies on effective usage of content analytics of the underlying unstructured data. I have high lighted this thought process in my earlier article, Big Data Analytics Thinking Outside Of Hadoop.
Unstructured Content Analytics is defined as the process of gaining new insights from the unstructured data, by employing text mining, image recognition, voice recognition and other related analytical techniques.
Big Data Journal was launched on SYS-CON.com in 2012
The below material explains one such use case of Big Data & Text Analytics in getting meaningful insights from the Financial Reports.
Financial Reports & Analytics: All the publicly traded companies in USA & else where mandatorily disclose their corporate information to their shareholders. These annual financial statements are available as downloadable reports on the corporate websites of public companies. Apart from the annual report , there are other forms of financial statements like, investor news letters, Quarterly earning presentation, conference calls by CFO and other investor relationship documents form part of an organization's financial standing in the eyes of the investor.
Most of the investors and investment analyst firms currently uses their specialized knowledge to understand these financial statements and create meaningful insights out of them. However these analytics are mostly limited to the structured portions of the financial statements and not so much on the unstructured side of it.
To explain this more :
- For example An annual report may contain statements like Balance Sheet, Income, Equity, Cash Flows etc.. these statements are highly structured and organized as per accounting principles so that any of the qualified financial analysts can understand them
- At the same a typical financial statement also contains lot of unstructured information about growth strategies of the organization, road map, optimism, future vision, how the business model is aligned to the changing times etc...
So an effective analysis of a financial statement not only pertains to the structured information but also to the unstructured data available in the financial statements.
BigData, UIMA & Financial Report Analytics: The following Big Data aligned technologies can be effectively used in analysing the financial reports to derive meaningful insights into the large volumes of unstructured data.
- UIMA : UIMA stands for Unstructured Information Management Architecture is the major industry standard for content analytics.
- Annotators : UIMATM Annotators do the real work of extracting structured information from unstructured data. You can write your own annotators. Though Annotators form part of UIMA framework lot of custom development is written is creating Annotators specific to the needs of the Finance industry. When documents are processed through the document processing pipeline, the annotators extract concepts, words, phrases, classifications, and named entities from unstructured content and mark these extractions as annotations. The annotations are added to the index as tokens or facets and are used as the source for content analysis.
- Taxonomies : Taxonomies play a major role in identifying the topics of interest within a document using UIMA. In UIMA a type system defines the various types of objects that may be discovered in the document. Types in a UIMA type system may be organized into a taxonomy. For example, Company may be defined as a subtype of Organization
Realizing Financial Statement Analytics & Role of XBRL: There are not very many UIMA annotators and implementation of text extraction specific to financial statements. However we find that, under APACHE UIMA community there is one such annotator, The AlchemyAPI Annotator is a set of annotators that wrap the AlchemyAPI.
AlchemyAPI's (http://www.alchemyapi.com/api/) Categorization service can be used to categorize text, HTML, or web-based content, assigning the most likely topic category (news, sports, business, etc.). The business categories include topics like, Business and Finance News, SEC filings, etc.
There are several of the text analytics concepts like the below, can be applied on the financial statements
- Named Entity Extraction : Identify people, companies, organizations, cities, geographic features, and other typed entities within HTML pages and text documents/content.
- Concept Tagging : Automatically tag documents and text in a manner similar to human-based tagging.
- Keyword / Term Extraction : Extract important terms and "topic" keywords from HTML pages and text documents/content. Advanced statistical and linguistic algorithms analyze your content, "tagging" it with the most important words and phrases.
- Sentiment Analysis : Identify positive, negative and neutral sentiment within HTML pages and text documents/content.
- Relation Extraction : Identify facts and Subject-Action-Object relations within HTML pages and text documents/content.
Apart from the already developed and community supported annotators, we could develop new annotators which can take the best use of already established taxonomies for the financial industry in the form of XBRL.
XBRL stands for eXtensible Business Reporting Language. It is a language for the electronic communication of business information, providing major benefits in the preparation, analysis and communication of business information. It is one of a family of "XML" languages which is a standard means of communicating information between businesses and on the internet.
XBRL Taxonomies, are the dictionaries which the language uses. These are the categorization schemes which define the specific tags for individual items of data (such as "net profit"). National jurisdictions have different accounting regulations, so each may have its own. There are already well established approved taxonomies for financial reporting like XBRL US GAAP as listed in the site, http://www.xbrl.org/FRTApproved.
As evident from the architecture of UIMA and annotator entity extraction process, these established taxonomies can play a major role in areas like concept tagging, which can help in getting the meaningful insights from large amounts of textual and other unstructured content in the financial statements.
Summary: As enterprises and analytics vendors adopt Big Data as part of the mainstream , this adoption will be more meaningful to enable the technology to support new business use cases. Financial Analytics is one such important area , and with the support of frameworks like UIMA coupled with industry established taxonomies, such analytics are quite possible and worth to be implemented.
Technology vendors and analysts are eager to paint a rosy picture of how wonderful IoT is and why your deployment will be great with the use of their products and services. While it is easy to showcase successful IoT solutions, identifying IoT systems that missed the mark or failed can often provide more in the way of key lessons learned. In his session at @ThingsExpo, Peter Vanderminden, Principal Industry Analyst for IoT & Digital Supply Chain to Flatiron Strategies, will focus on how IoT de...
Sep. 25, 2016 04:15 AM EDT Reads: 940
DevOps at Cloud Expo, taking place Nov 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 19th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long dev...
Sep. 25, 2016 03:15 AM EDT Reads: 3,341
Almost two-thirds of companies either have or soon will have IoT as the backbone of their business in 2016. However, IoT is far more complex than most firms expected. How can you not get trapped in the pitfalls? In his session at @ThingsExpo, Tony Shan, a renowned visionary and thought leader, will introduce a holistic method of IoTification, which is the process of IoTifying the existing technology and business models to adopt and leverage IoT. He will drill down to the components in this fra...
Sep. 25, 2016 03:00 AM EDT Reads: 1,444
There is growing need for data-driven applications and the need for digital platforms to build these apps. In his session at 19th Cloud Expo, Muddu Sudhakar, VP and GM of Security & IoT at Splunk, will cover different PaaS solutions and Big Data platforms that are available to build applications. In addition, AI and machine learning are creating new requirements that developers need in the building of next-gen apps. The next-generation digital platforms have some of the past platform needs a...
Sep. 25, 2016 02:45 AM EDT Reads: 1,696
Without a clear strategy for cost control and an architecture designed with cloud services in mind, costs and operational performance can quickly get out of control. To avoid multiple architectural redesigns requires extensive thought and planning. Boundary (now part of BMC) launched a new public-facing multi-tenant high resolution monitoring service on Amazon AWS two years ago, facing challenges and learning best practices in the early days of the new service. In his session at 19th Cloud Exp...
Sep. 25, 2016 02:45 AM EDT Reads: 861
I'm a lonely sensor. I spend all day telling the world how I'm feeling, but none of the other sensors seem to care. I want to be connected. I want to build relationships with other sensors to be more useful for my human. I want my human to understand that when my friends next door are too hot for a while, I'll soon be flaming. And when all my friends go outside without me, I may be left behind. Don't just log my data; use the relationship graph. In his session at @ThingsExpo, Ryan Boyd, Engi...
Sep. 25, 2016 02:15 AM EDT Reads: 1,204
Information technology is an industry that has always experienced change, and the dramatic change sweeping across the industry today could not be truthfully described as the first time we've seen such widespread change impacting customer investments. However, the rate of the change, and the potential outcomes from today's digital transformation has the distinct potential to separate the industry into two camps: Organizations that see the change coming, embrace it, and successful leverage it; and...
Sep. 25, 2016 12:45 AM EDT Reads: 1,033
Data is an unusual currency; it is not restricted by the same transactional limitations as money or people. In fact, the more that you leverage your data across multiple business use cases, the more valuable it becomes to the organization. And the same can be said about the organization’s analytics. In his session at 19th Cloud Expo, Bill Schmarzo, CTO for the Big Data Practice at EMC, will introduce a methodology for capturing, enriching and sharing data (and analytics) across the organizati...
Sep. 24, 2016 09:45 PM EDT Reads: 1,606
SYS-CON Events announced today that Secure Channels will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. The bedrock of Secure Channels Technology is a uniquely modified and enhanced process based on superencipherment. Superencipherment is the process of encrypting an already encrypted message one or more times, either using the same or a different algorithm.
Sep. 24, 2016 09:00 PM EDT Reads: 1,429
The vision of a connected smart home is becoming reality with the application of integrated wireless technologies in devices and appliances. The use of standardized and TCP/IP networked wireless technologies in line-powered and battery operated sensors and controls has led to the adoption of radios in the 2.4GHz band, including Wi-Fi, BT/BLE and 802.15.4 applied ZigBee and Thread. This is driving the need for robust wireless coexistence for multiple radios to ensure throughput performance and th...
Sep. 24, 2016 08:30 PM EDT Reads: 1,444
The Internet of Things can drive efficiency for airlines and airports. In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect with GE, and Sudip Majumder, senior director of development at Oracle, will discuss the technical details of the connected airline baggage and related social media solutions. These IoT applications will enhance travelers' journey experience and drive efficiency for the airlines and the airports. The session will include a working demo and a technical d...
Sep. 24, 2016 08:00 PM EDT Reads: 1,654
SYS-CON Events announced today the Enterprise IoT Bootcamp, being held November 1-2, 2016, in conjunction with 19th Cloud Expo | @ThingsExpo at the Santa Clara Convention Center in Santa Clara, CA. Combined with real-world scenarios and use cases, the Enterprise IoT Bootcamp is not just based on presentations but with hands-on demos and detailed walkthroughs. We will introduce you to a variety of real world use cases prototyped using Arduino, Raspberry Pi, BeagleBone, Spark, and Intel Edison. Y...
Sep. 24, 2016 07:00 PM EDT Reads: 2,792
If you’re responsible for an application that depends on the data or functionality of various IoT endpoints – either sensors or devices – your brand reputation depends on the security, reliability, and compliance of its many integrated parts. If your application fails to deliver the expected business results, your customers and partners won't care if that failure stems from the code you developed or from a component that you integrated. What can you do to ensure that the endpoints work as expect...
Sep. 24, 2016 04:30 PM EDT Reads: 1,514
Traditional on-premises data centers have long been the domain of modern data platforms like Apache Hadoop, meaning companies who build their business on public cloud were challenged to run Big Data processing and analytics at scale. But recent advancements in Hadoop performance, security, and most importantly cloud-native integrations, are giving organizations the ability to truly gain value from all their data. In his session at 19th Cloud Expo, David Tishgart, Director of Product Marketing ...
Sep. 24, 2016 04:30 PM EDT Reads: 1,704
Enterprise IT has been in the era of Hybrid Cloud for some time now. But it seems most conversations about Hybrid are focused on integrating AWS, Microsoft Azure, or Google ECM into existing on-premises systems. Where is all the Private Cloud? What do technology providers need to do to make their offerings more compelling? How should enterprise IT executives and buyers define their focus, needs, and roadmap, and communicate that clearly to the providers?
Sep. 24, 2016 01:00 PM EDT Reads: 1,490
The Transparent Cloud-computing Consortium (abbreviation: T-Cloud Consortium) will conduct research activities into changes in the computing model as a result of collaboration between "device" and "cloud" and the creation of new value and markets through organic data processing High speed and high quality networks, and dramatic improvements in computer processing capabilities, have greatly changed the nature of applications and made the storing and processing of data on the network commonplace.
Sep. 24, 2016 12:00 PM EDT Reads: 775
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
Sep. 24, 2016 12:00 PM EDT Reads: 781
Digital innovation is the next big wave of business transformation based on digital technologies of which IoT and Big Data are key components, For example: Business boundary innovation is a challenge to excavate third-party business value using IoT and BigData, like Nest Business structure innovation may propose re-building business structure from scratch, as Uber does in the taxicab industry The social model innovation is also a big challenge to the new social architecture with the design fr...
Sep. 24, 2016 11:45 AM EDT Reads: 1,045
SYS-CON Events announced today that Pulzze Systems will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Pulzze Systems, Inc. provides infrastructure products for the Internet of Things to enable any connected device and system to carry out matched operations without programming. For more information, visit http://www.pulzzesystems.com.
Sep. 24, 2016 10:45 AM EDT Reads: 1,726
Big Data has been changing the world. IoT fuels the further transformation recently. How are Big Data and IoT related? In his session at @BigDataExpo, Tony Shan, a renowned visionary and thought leader, will explore the interplay of Big Data and IoT. He will anatomize Big Data and IoT separately in terms of what, which, why, where, when, who, how and how much. He will then analyze the relationship between IoT and Big Data, specifically the drilldown of how the 4Vs of Big Data (Volume, Variety,...
Sep. 24, 2016 10:00 AM EDT Reads: 921