|By Srinivasan Sundara Rajan||
|January 30, 2013 07:00 AM EST||
Big Data & Text Analytics: As the analysis of large amounts of unstructured data is gaining a major space in enterprise computing, we are seeing the emergence of more use cases in this regard. While the term "Big" in Big Data makes it more synonymous with Massively Parallel Processing frameworks like Hadoop, however the underlying the success of Big Data relies on effective usage of content analytics of the underlying unstructured data. I have high lighted this thought process in my earlier article, Big Data Analytics Thinking Outside Of Hadoop.
Unstructured Content Analytics is defined as the process of gaining new insights from the unstructured data, by employing text mining, image recognition, voice recognition and other related analytical techniques.
Big Data Journal was launched on SYS-CON.com in 2012
The below material explains one such use case of Big Data & Text Analytics in getting meaningful insights from the Financial Reports.
Financial Reports & Analytics: All the publicly traded companies in USA & else where mandatorily disclose their corporate information to their shareholders. These annual financial statements are available as downloadable reports on the corporate websites of public companies. Apart from the annual report , there are other forms of financial statements like, investor news letters, Quarterly earning presentation, conference calls by CFO and other investor relationship documents form part of an organization's financial standing in the eyes of the investor.
Most of the investors and investment analyst firms currently uses their specialized knowledge to understand these financial statements and create meaningful insights out of them. However these analytics are mostly limited to the structured portions of the financial statements and not so much on the unstructured side of it.
To explain this more :
- For example An annual report may contain statements like Balance Sheet, Income, Equity, Cash Flows etc.. these statements are highly structured and organized as per accounting principles so that any of the qualified financial analysts can understand them
- At the same a typical financial statement also contains lot of unstructured information about growth strategies of the organization, road map, optimism, future vision, how the business model is aligned to the changing times etc...
So an effective analysis of a financial statement not only pertains to the structured information but also to the unstructured data available in the financial statements.
BigData, UIMA & Financial Report Analytics: The following Big Data aligned technologies can be effectively used in analysing the financial reports to derive meaningful insights into the large volumes of unstructured data.
- UIMA : UIMA stands for Unstructured Information Management Architecture is the major industry standard for content analytics.
- Annotators : UIMATM Annotators do the real work of extracting structured information from unstructured data. You can write your own annotators. Though Annotators form part of UIMA framework lot of custom development is written is creating Annotators specific to the needs of the Finance industry. When documents are processed through the document processing pipeline, the annotators extract concepts, words, phrases, classifications, and named entities from unstructured content and mark these extractions as annotations. The annotations are added to the index as tokens or facets and are used as the source for content analysis.
- Taxonomies : Taxonomies play a major role in identifying the topics of interest within a document using UIMA. In UIMA a type system defines the various types of objects that may be discovered in the document. Types in a UIMA type system may be organized into a taxonomy. For example, Company may be defined as a subtype of Organization
Realizing Financial Statement Analytics & Role of XBRL: There are not very many UIMA annotators and implementation of text extraction specific to financial statements. However we find that, under APACHE UIMA community there is one such annotator, The AlchemyAPI Annotator is a set of annotators that wrap the AlchemyAPI.
AlchemyAPI's (http://www.alchemyapi.com/api/) Categorization service can be used to categorize text, HTML, or web-based content, assigning the most likely topic category (news, sports, business, etc.). The business categories include topics like, Business and Finance News, SEC filings, etc.
There are several of the text analytics concepts like the below, can be applied on the financial statements
- Named Entity Extraction : Identify people, companies, organizations, cities, geographic features, and other typed entities within HTML pages and text documents/content.
- Concept Tagging : Automatically tag documents and text in a manner similar to human-based tagging.
- Keyword / Term Extraction : Extract important terms and "topic" keywords from HTML pages and text documents/content. Advanced statistical and linguistic algorithms analyze your content, "tagging" it with the most important words and phrases.
- Sentiment Analysis : Identify positive, negative and neutral sentiment within HTML pages and text documents/content.
- Relation Extraction : Identify facts and Subject-Action-Object relations within HTML pages and text documents/content.
Apart from the already developed and community supported annotators, we could develop new annotators which can take the best use of already established taxonomies for the financial industry in the form of XBRL.
XBRL stands for eXtensible Business Reporting Language. It is a language for the electronic communication of business information, providing major benefits in the preparation, analysis and communication of business information. It is one of a family of "XML" languages which is a standard means of communicating information between businesses and on the internet.
XBRL Taxonomies, are the dictionaries which the language uses. These are the categorization schemes which define the specific tags for individual items of data (such as "net profit"). National jurisdictions have different accounting regulations, so each may have its own. There are already well established approved taxonomies for financial reporting like XBRL US GAAP as listed in the site, http://www.xbrl.org/FRTApproved.
As evident from the architecture of UIMA and annotator entity extraction process, these established taxonomies can play a major role in areas like concept tagging, which can help in getting the meaningful insights from large amounts of textual and other unstructured content in the financial statements.
Summary: As enterprises and analytics vendors adopt Big Data as part of the mainstream , this adoption will be more meaningful to enable the technology to support new business use cases. Financial Analytics is one such important area , and with the support of frameworks like UIMA coupled with industry established taxonomies, such analytics are quite possible and worth to be implemented.
Cognitive Computing is becoming the foundation for a new generation of solutions that have the potential to transform business. Unlike traditional approaches to building solutions, a cognitive computing approach allows the data to help determine the way applications are designed. This contrasts with conventional software development that begins with defining logic based on the current way a business operates. In her session at 18th Cloud Expo, Judith S. Hurwitz, President and CEO of Hurwitz & ...
Apr. 27, 2017 10:15 PM EDT Reads: 9,219
SYS-CON Events announced today that Interoute, owner-operator of one of Europe's largest networks and a global cloud services platform, has been named “Bronze Sponsor” of SYS-CON's 20th Cloud Expo, which will take place on June 6-8, 2017 at the Javits Center in New York, New York. Interoute is the owner-operator of one of Europe's largest networks and a global cloud services platform which encompasses 12 data centers, 14 virtual data centers and 31 colocation centers, with connections to 195 add...
Apr. 27, 2017 10:00 PM EDT Reads: 2,037
With billions of sensors deployed worldwide, the amount of machine-generated data will soon exceed what our networks can handle. But consumers and businesses will expect seamless experiences and real-time responsiveness. What does this mean for IoT devices and the infrastructure that supports them? More of the data will need to be handled at - or closer to - the devices themselves.
Apr. 27, 2017 09:45 PM EDT Reads: 878
Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like “How is my application doing” but no id...
Apr. 27, 2017 09:15 PM EDT Reads: 7,289
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place June 6-8, 2017, at the Javits Center in New York City, New York, is co-located with 20th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry p...
Apr. 27, 2017 09:15 PM EDT Reads: 1,259
Multiple data types are pouring into IoT deployments. Data is coming in small packages as well as enormous files and data streams of many sizes. Widespread use of mobile devices adds to the total. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists will look at the tools and environments that are being put to use in IoT deployments, as well as the team skills a modern enterprise IT shop needs to keep things running, get a handle on all this data, and deli...
Apr. 27, 2017 08:45 PM EDT Reads: 2,480
SYS-CON Events announced today that CollabNet, a global leader in enterprise software development, release automation and DevOps solutions, will be a Bronze Sponsor of SYS-CON's 20th International Cloud Expo®, taking place from June 6-8, 2017, at the Javits Center in New York City, NY. CollabNet offers a broad range of solutions with the mission of helping modern organizations deliver quality software at speed. The company’s latest innovation, the DevOps Lifecycle Manager (DLM), supports Value S...
Apr. 27, 2017 08:00 PM EDT Reads: 1,131
Automation is enabling enterprises to design, deploy, and manage more complex, hybrid cloud environments. Yet the people who manage these environments must be trained in and understanding these environments better than ever before. A new era of analytics and cognitive computing is adding intelligence, but also more complexity, to these cloud environments. How smart is your cloud? How smart should it be? In this power panel at 20th Cloud Expo, moderated by Conference Chair Roger Strukhoff, pane...
Apr. 27, 2017 07:30 PM EDT Reads: 2,292
The explosion of new web/cloud/IoT-based applications and the data they generate are transforming our world right before our eyes. In this rush to adopt these new technologies, organizations are often ignoring fundamental questions concerning who owns the data and failing to ask for permission to conduct invasive surveillance of their customers. Organizations that are not transparent about how their systems gather data telemetry without offering shared data ownership risk product rejection, regu...
Apr. 27, 2017 07:15 PM EDT Reads: 1,628
SYS-CON Events announced today that Progress, a global leader in application development, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Enterprises today are rapidly adopting the cloud, while continuing to retain business-critical/sensitive data inside the firewall. This is creating two separate data silos – one inside the firewall and the other outside the firewall. Cloud ISVs oft...
Apr. 27, 2017 07:00 PM EDT Reads: 327
The age of Digital Disruption is evolving into the next era – Digital Cohesion, an age in which applications securely self-assemble and deliver predictive services that continuously adapt to user behavior. Information from devices, sensors and applications around us will drive services seamlessly across mobile and fixed devices/infrastructure. This evolution is happening now in software defined services and secure networking. Four key drivers – Performance, Economics, Interoperability and Trust ...
Apr. 27, 2017 06:45 PM EDT Reads: 753
SYS-CON Events announced today that Grape Up will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company specializing in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market across the U.S. and Europe, Grape Up works with a variety of customers from emergi...
Apr. 27, 2017 06:45 PM EDT Reads: 2,223
@ThingsExpo has been named the Most Influential ‘Smart Cities - IIoT' Account and @BigDataExpo has been named fourteenth by Right Relevance (RR), which provides curated information and intelligence on approximately 50,000 topics. In addition, Right Relevance provides an Insights offering that combines the above Topics and Influencers information with real time conversations to provide actionable intelligence with visualizations to enable decision making. The Insights service is applicable to eve...
Apr. 27, 2017 06:30 PM EDT Reads: 2,825
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
Apr. 27, 2017 06:00 PM EDT Reads: 1,048
Cybersecurity is a critical component of software development in many industries including medical devices. However, code is not always written to be robust or secure from the unknown or the unexpected. This gap can make medical devices susceptible to cybersecurity attacks ranging from compromised personal health information to life-sustaining treatment. In his session at @ThingsExpo, Clark Fortney, Software Engineer at Battelle, will discuss how programming oversight using key methods can incre...
Apr. 27, 2017 05:30 PM EDT Reads: 3,975
Quickly find the root cause of complex database problems slowing down your applications. Up to 88% of all application performance issues are related to the database. DPA’s unique response time analysis shows you exactly what needs fixing - in four clicks or less. Optimize performance anywhere. Database Performance Analyzer monitors on-premises, on VMware®, and in the Cloud, including Amazon® AWS and Azure™ virtual machines.
Apr. 27, 2017 04:15 PM EDT Reads: 1,980
[session] The IoT Evolution Will Be Led by Novel ‘Abstraction’ By @InteractorTeam | @ThingsExpo #IoT #M2M
Most technology leaders, contemporary and from the hardware era, are reshaping their businesses to do software in the hope of capturing value in IoT. Although IoT is relatively new in the market, it has already gone through many promotional terms such as IoE, IoX, SDX, Edge/Fog, Mist Compute, etc. Ultimately, irrespective of the name, it is about deriving value from independent software assets participating in an ecosystem as one comprehensive solution.
Apr. 27, 2017 03:45 PM EDT Reads: 572
SYS-CON Events announced today that CA Technologies has been named "Platinum Sponsor" of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, New York, and 21st International Cloud Expo, which will take place in November in Silicon Valley, California.
Apr. 27, 2017 02:00 PM EDT Reads: 2,449
The 20th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held June 6-8, 2017, at the Javits Center in New York City, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Containers, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal ...
Apr. 27, 2017 01:45 PM EDT Reads: 1,321
SYS-CON Events announced today that Juniper Networks (NYSE: JNPR), an industry leader in automated, scalable and secure networks, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Juniper Networks challenges the status quo with products, solutions and services that transform the economics of networking. The company co-innovates with customers and partners to deliver automated, scalable and secure network...
Apr. 27, 2017 01:15 PM EDT Reads: 1,478