Click here to close now.


@BigDataExpo Authors: Pat Romanski, Liz McMillan, Stephen Baker, Elizabeth White, Esmeralda Swartz

Related Topics: @CloudExpo, Industrial IoT, Microservices Expo, Containers Expo Blog, Agile Computing, @BigDataExpo

@CloudExpo: Article

Semantic Interoperability: The Pot of Gold Under the Rainbow

The problem with semantic interoperability is that human communication is inherently vague, ambiguous, and relative

Our ZapThink 2020 poster lays out our complex web of predictions for enterprise IT in the year 2020. You might think that semantic operability is an important part of this story; after all, several groups have been heads down working on the problem of how to teach computers to agree on the meaning of the information they exchange for years now. But look again: we relegate semantics to the lower right corner, where we point out that we don’t believe there will be much progress in this area by 2020. Eventually, maybe, but even though semantic interoperability appears to be within our grasp, it behaves more like the pot of gold under the rainbow. The closer you get to the rainbow, the farther away it appears.

What gives? ZapThink usually takes an optimistic perspective about the future of technology, but we’re decidedly pessimistic about the prospects for semantic interoperability. The problem as we see it comes down to the human understanding of language. All efforts to standardize meanings in order to facilitate semantic interoperability strip out vagueness and ambiguity from data, and presume a single, universal underlying grammar. After all, isn’t the goal to foster precise, unambiguous, and consistent communication between systems? The problem is, human communication is inherently vague, ambiguous, and relative. The way humans understand the world, the way we think, and the way we put our thoughts into language require both vagueness and ambiguity. Without them, we lose important aspects of meaning. Furthermore, how we structure our language is culturally and linguistically relative. As a result, current semantic interoperability efforts will be able to address a certain class of problems, but in the grand scheme of things, that class of problems is a relatively small subset of the types of communication we would prefer to automate between systems.

The Importance of Vagueness
Ironically, to discuss semantics we must first define our terms. A term is vague when it’s impossible to say whether the term applies in certain circumstances, for example, “my face is red.” Just how red does it have to be before we’re sure it’s red? In contrast, a term is ambiguous when it’s possible to interpret it in more than one way. For example, “I’m going to a bank” might mean that I’m going to a financial institution or to the side of a river.

Vagueness leads to knotty problems in philosophy that impact our ability to provide semantic interoperability. So, let’s go back to philosophy class, and study the sorites paradox. If you have a heap of sand and you take away a single grain, do you still have a heap of sand? Certainly. OK, repeat the process. Clearly, when you get down to a single grain of sand remaining, you no longer have a heap. So, when did the heap cease to be a heap?

Philosophers and linguists have been arguing over how to solve the sorites paradox for over a century now (yes, I know, they should find something more useful to do with their time). One answer: put your foot down and establish a precise boundary. 1,000 or more grains of sand are a heap, but 999 or less are not. Our computers will have no problem with such a resolution to the paradox, but it doesn’t accurately represent what we really mean by a heap. After all, if 1,000 grains constitutes a heap, wouldn’t 999? Central to the meaning of the term “heap” is its inherent vagueness.

Another solution: yes, there is some number of grains of sand where a heap ceases to be a heap, but we can’t know what it is. This resolution might satisfy some philosophers, but it doesn’t help our computers make sense out of our language. A third approach: instead of considering “is a heap” and “isn’t a heap” as the only two possible values, define a spectrum of intermediary values, or perhaps a continuum of values. The computer scientists are likely to be happy with this answer, as it lends itself to fuzzy logic: the statement “this pile of sand is a heap” might be, say, 40% true. Yes, we can do our fuzzy logic math now, but we’ve still lost some fundamental elements of meaning.

To bring back our natural language-based understanding of the sorites paradox, let’s step away from an overly analytical approach to the problem and try to look at the paradox from a human perspective. How, for example, would a seven-year-old describe the heap of sand as you take away a grain of sand at a time? They might answer, “well, it’s a smaller heap” or “it’s kinda a heap” or “it’s a little heap” or “it’s not really a heap,” etc. Such expressions are clearly not precise. Our computers wouldn’t be able to make much sense out of them. But these simple, even childish expressions are how people really speak and how people truly understand vagueness.

The important takeaway here is that vagueness isn’t a property relegated to heaps and blushing faces. It’s a ubiquitous property of virtually all human communication, even within the business context. Take for example an insurance policy. Insurance policies have a number of properties (policy holder, underwriter, insured property, deductible, etc.) and relationships to other business entities (policy application, underwriting documentation, claims forms, etc.) Now let’s add or take away individual properties and relationships from our canonical understanding of an insurance policy one at a time. Is it still a policy? Clearly, if we take away everything that makes a policy a policy then it’s no longer a policy. But if we take away a single property, we’re likely to say it’s still a policy. So where do we draw the line? If philosophers and linguists haven’t solved this problem in over a century, don’t expect your semantic interoperability tool to make much headway either.

The Problem of Linguistic Relativity
Another century-long battle in the world of linguistics is the fray over linguistic relativity vs. linguistic universality. Linguistic relativity is the position that language affects how speakers see their world, and by extension, how they think. In the other corner is Noam Chomsky’s universal grammar, the linguistic theory that grammar is hardwired into the brain, and hence universal across all peoples regardless of their language or their culture. Theoretical work on a universal grammar has led to dramatic advances in natural language translation, and we all get to use and appreciate Google Translate and its brethren as a result. But while Google Translate is a miraculous tool indeed (especially for us Star Trek fans who marveled at the Universal Translator), it doesn’t take a polyglot to realize that the state of the art for such technology still leaves much to be desired.

Linguistic relativity, however, goes at the heart of the semantic interoperability challenge. Take for example, one of today’s most useful semantic standards: the Resource Description Framework (RDF). RDF is a metadata data model intended for making statements about resources (in particular, Web-based resources) in the form of subject-predicate-object expressions. For example, you might be able to express the statement “ZapThink wrote this ZapFlash” in the triplet consisting of “ZapThink” (the subject); “wrote” (the predicate); and “this ZapFlash” (the object). Take this basic triplet building block and you can build semantic webs of arbitrary complexity, with the eventual goal of describing the relationships among all business entities within a particular business context.

The problem with the approach RDF takes, however, is that the subject-predicate-object structure is Eurocentric. Non-European languages (and hence, non-European speakers) don’t necessarily think in sentences that follow this structure. And furthermore, this problem isn’t new. In fact, the research into this phenomenon dates back to the 1940s, with the work of linguist Benjamin Lee Whorf. Whorf conducted linguistic research among the Hopi and other Native American peoples, and thus established an empirical basis for linguistic relativity. The illustration below comes from one of his seminal papers on the subject:

In the graphic above, Whorf compares a simple sentence, “I clean it with a ramrod,” where “it” refers to a gun, in English and Shawnee. The English sentence predictably follows the subject-predicate-object format that RDF leverages. The Shawnee translation, however, translates literally to “dry space/interior of hole/by motion of tool or instrument.” Not only is there no one-to-one correspondence between parts of speech across the two sentences, but the entire context of the expression is different. If you were in the unenviable position of establishing RDF-based semantic interoperability between, say, a British business and a Shawnee business, you’d find RDF far too culturally specific to rise to the challenge.

The ZapThink Take
We have tools for semantic interoperability today, of course – but all such tools require the human step of configuring or training the tool to understand the properties and relationships among entities. Once you’ve trained the tool, it’s possible to automate many semantic interactions. But to get this process started, we must get together in a room with the people we want to communicate with and hammer out the meanings of the terms we’d like to use.

This human component to semantic interoperability actually dates to the Stone Age. How did we do business in the Stone Age? Say your tribe was on the coast, so you had fish. You were getting tired of fish, so you and your tribemates decided to pack up some fish and bring the bundle to the next village where they had fruit. You showed up at the village market, only you had no common language. So what did you do? You held up some fish, pointed to some fruit, grunted, and waved your hands. If you established a basis of communication, you conducted business, and went home with some fruit. If not, then you went home empty handed (or you pulled out your clubs and attacked, but that’s another story). Cut to the 21st century, and little has changed. People still have to get together and establish a basis of communication as human beings in order to facilitate semantic interoperability. But fully automating such interoperability is as close as the next rainbow.

More Stories By Jason Bloomberg

Jason Bloomberg is the leading expert on architecting agility for the enterprise. As president of Intellyx, Mr. Bloomberg brings his years of thought leadership in the areas of Cloud Computing, Enterprise Architecture, and Service-Oriented Architecture to a global clientele of business executives, architects, software vendors, and Cloud service providers looking to achieve technology-enabled business agility across their organizations and for their customers. His latest book, The Agile Architecture Revolution (John Wiley & Sons, 2013), sets the stage for Mr. Bloomberg’s groundbreaking Agile Architecture vision.

Mr. Bloomberg is perhaps best known for his twelve years at ZapThink, where he created and delivered the Licensed ZapThink Architect (LZA) SOA course and associated credential, certifying over 1,700 professionals worldwide. He is one of the original Managing Partners of ZapThink LLC, the leading SOA advisory and analysis firm, which was acquired by Dovel Technologies in 2011. He now runs the successor to the LZA program, the Bloomberg Agile Architecture Course, around the world.

Mr. Bloomberg is a frequent conference speaker and prolific writer. He has published over 500 articles, spoken at over 300 conferences, Webinars, and other events, and has been quoted in the press over 1,400 times as the leading expert on agile approaches to architecture in the enterprise.

Mr. Bloomberg’s previous book, Service Orient or Be Doomed! How Service Orientation Will Change Your Business (John Wiley & Sons, 2006, coauthored with Ron Schmelzer), is recognized as the leading business book on Service Orientation. He also co-authored the books XML and Web Services Unleashed (SAMS Publishing, 2002), and Web Page Scripting Techniques (Hayden Books, 1996).

Prior to ZapThink, Mr. Bloomberg built a diverse background in eBusiness technology management and industry analysis, including serving as a senior analyst in IDC’s eBusiness Advisory group, as well as holding eBusiness management positions at USWeb/CKS (later marchFIRST) and WaveBend Solutions (now Hitachi Consulting).

@BigDataExpo Stories
As more and more data is generated from a variety of connected devices, the need to get insights from this data and predict future behavior and trends is increasingly essential for businesses. Real-time stream processing is needed in a variety of different industries such as Manufacturing, Oil and Gas, Automobile, Finance, Online Retail, Smart Grids, and Healthcare. Azure Stream Analytics is a fully managed distributed stream computation service that provides low latency, scalable processing of ...
As enterprises capture more and more data of all types – structured, semi-structured, and unstructured – data discovery requirements for business intelligence (BI), Big Data, and predictive analytics initiatives grow more complex. A company’s ability to become data-driven and compete on analytics depends on the speed with which it can provision their analytics applications with all relevant information. The task of finding data has traditionally resided with IT, but now organizations increasingl...
Apps and devices shouldn't stop working when there's limited or no network connectivity. Learn how to bring data stored in a cloud database to the edge of the network (and back again) whenever an Internet connection is available. In his session at 17th Cloud Expo, Bradley Holt, Developer Advocate at IBM Cloud Data Services, will demonstrate techniques for replicating cloud databases with devices in order to build offline-first mobile or Internet of Things (IoT) apps that can provide a better, ...
You have your devices and your data, but what about the rest of your Internet of Things story? Two popular classes of technologies that nicely handle the Big Data analytics for Internet of Things are Apache Hadoop and NoSQL. Hadoop is designed for parallelizing analytical work across many servers and is ideal for the massive data volumes you create with IoT devices. NoSQL databases such as Apache HBase are ideal for storing and retrieving IoT data as “time series data.”
“The Internet of Things transforms the way organizations leverage machine data and gain insights from it,” noted Splunk’s CTO Snehal Antani, as Splunk announced accelerated momentum in Industrial Data and the IoT. The trend is driven by Splunk’s continued investment in its products and partner ecosystem as well as the creativity of customers and the flexibility to deploy Splunk IoT solutions as software, cloud services or in a hybrid environment. Customers are using Splunk® solutions to collect ...
Cloud computing delivers on-demand resources that provide businesses with flexibility and cost-savings. The challenge in moving workloads to the cloud has been the cost and complexity of ensuring the initial and ongoing security and regulatory (PCI, HIPAA, FFIEC) compliance across private and public clouds. Manual security compliance is slow, prone to human error, and represents over 50% of the cost of managing cloud applications. Determining how to automate cloud security compliance is critical...
SYS-CON Events announced today that Alert Logic, the leading provider of Security-as-a-Service solutions for the cloud, has been named “Bronze Sponsor” of SYS-CON's 17th International Cloud Expo® and DevOps Summit 2015 Silicon Valley, which will take place November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Alert Logic provides Security-as-a-Service for on-premises, cloud, and hybrid IT infrastructures, delivering deep security insight and continuous protection for cust...
Mobile, social, Big Data, and cloud have fundamentally changed the way we live. “Anytime, anywhere” access to data and information is no longer a luxury; it’s a requirement, in both our personal and professional lives. For IT organizations, this means pressure has never been greater to deliver meaningful services to the business and customers.
SYS-CON Events announced today that Harbinger Systems will exhibit at SYS-CON's 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Harbinger Systems is a global company providing software technology services. Since 1990, Harbinger has developed a strong customer base worldwide. Its customers include software product companies ranging from hi-tech start-ups in Silicon Valley to leading product companies in the US a...
Clearly the way forward is to move to cloud be it bare metal, VMs or containers. One aspect of the current public clouds that is slowing this cloud migration is cloud lock-in. Every cloud vendor is trying to make it very difficult to move out once a customer has chosen their cloud. In his session at 17th Cloud Expo, Naveen Nimmu, CEO of Clouber, Inc., will advocate that making the inter-cloud migration as simple as changing airlines would help the entire industry to quickly adopt the cloud wit...
Redis is not only the fastest database, but it has become the most popular among the new wave of applications running in containers. Redis speeds up just about every data interaction between your users or operational systems. In his session at 17th Cloud Expo, Dave Nielsen, Developer Relations at Redis Labs, will share the functions and data structures used to solve everyday use cases that are driving Redis' popularity
SYS-CON Events announced today that Machkey International Company will exhibit at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Machkey provides advanced connectivity solutions for just about everyone. Businesses or individuals, Machkey is dedicated to provide high-quality and cost-effective products to meet all your needs.
SYS-CON Events announced today that HPM Networks will exhibit at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. For 20 years, HPM Networks has been integrating technology solutions that solve complex business challenges. HPM Networks has designed solutions for both SMB and enterprise customers throughout the San Francisco Bay Area.
SYS-CON Events announced today that DataClear Inc. will exhibit at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. The DataClear ‘BlackBox’ is the only solution that moves your PC, browsing and data out of the United States and away from prying (and spying) eyes. Its solution automatically builds you a clean, on-demand, virus free, new virtual cloud based PC outside of the United States, and wipes it clean...
SYS-CON Events announced today that VividCortex, the monitoring solution for the modern data system, will exhibit at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. The database is the heart of most applications, but it’s also the part that’s hardest to scale, monitor, and optimize even as it’s growing 50% year over year. VividCortex is the first unified suite of database monitoring tools specifically desi...
Organizations already struggle with the simple collection of data resulting from the proliferation of IoT, lacking the right infrastructure to manage it. They can't only rely on the cloud to collect and utilize this data because many applications still require dedicated infrastructure for security, redundancy, performance, etc. In his session at 17th Cloud Expo, Emil Sayegh, CEO of Codero Hosting, will discuss how in order to resolve the inherent issues, companies need to combine dedicated a...
DevOps is speeding towards the IT world like a freight train and the hype around it is deafening. There is no reason to be afraid of this change as it is the natural reaction to the agile movement that revolutionized development just a few years ago. By definition, DevOps is the natural alignment of IT performance to business profitability. The relevance of this has yet to be quantified but it has been suggested that the route to the CEO’s chair will come from the IT leaders that successfully ma...
The broad selection of hardware, the rapid evolution of operating systems and the time-to-market for mobile apps has been so rapid that new challenges for developers and engineers arise every day. Security, testing, hosting, and other metrics have to be considered through the process. In his session at Big Data Expo, Walter Maguire, Chief Field Technologist, HP Big Data Group, at Hewlett-Packard, will discuss the challenges faced by developers and a composite Big Data applications builder, foc...
SYS-CON Events announced today that MobiDev, a software development company, will exhibit at the 17th International Cloud Expo®, which will take place November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. MobiDev is a software development company with representative offices in Atlanta (US), Sheffield (UK) and Würzburg (Germany); and development centers in Ukraine. Since 2009 it has grown from a small group of passionate engineers and business managers to a full-scale mobi...
Data loss happens, even in the cloud. In fact, if your company has adopted a cloud application in the past three years, data loss has probably happened, whether you know it or not. In his session at 17th Cloud Expo, Bryan Forrester, Senior Vice President of Sales at eFolder, will present how common and costly cloud application data loss is and what measures you can take to protect your organization from data loss.