|By Ed Featherston||
|January 7, 2017 06:00 AM EST||
Over the years, one of my favorite pastimes has been working on the family genealogy. I first started work on it in the early 1990s. At that time, records were not digitized and research involved going to libraries, newspapers, and various local, state, and federal archives. There one would have to sift through reams of old records, documents and microfiche. If you were lucky, someone had created printed indices of the information contained in those documents. These indices, when they existed, were based on manual transcription of the data, and prone to the data quality. This is inherent in transcribing information from what frequently were old handwritten documents. The challenge was then trying to find those nuggets of information that would relate and connect to individuals you were trying to locate in your research.
For anyone operating in the Big Data space of today, this may all sound very familiar. Genealogists, amateur and otherwise, have been dealing with the challenges of Big Data, unstructured data, and data quality long before the terms became technology buzzwords. There are tools and products today to help; who hasn't seen the commercials for ancestry.com? However, even with technology, there are still challenges. These challenges are not unique to genealogy, and make a good lens for viewing and discussing the business needs in general for Big Data. Let's take a closer look at some of them.
Even with the great strides technology has taken, data quality remains a tremendous challenge for genealogy researchers. Let's take U.S. Census data as an example. Every 10 years, the U.S. government conducts a census of the population, and the results of these censuses become public 72 years after they are taken (the 1940 Census material just recently became available). U.S. Census data is a gold mine for genealogy research.
Below is a sample from the 1930 Federal census. Census forms were filled out by individuals going door to door and asking the residents questions. However, there were a number of data quality factors you must take into consideration. The sample here has fairly good quality handwriting, although that's not always the case. Also, you are constrained by the census takers interpretation of the person's answers and pronunciation. For example, this could result in different variations on the spelling of names.
When this document gets transcribed, you could still have multiple sources of problems with the data quality. The original census taker could have written it down incorrectly or the person transcribing it could have made a transcription error.
This challenge is not unique to genealogy research. Data quality has been an issue in IT systems since the first IT system. In the world of Big Data, unstructured data (such as social media), and things like crowd-sourced data, can become a daunting challenge. As with any challenge, we must understand the impact of those issues, the risk, and what can be done to mitigate that risk. In my above example, Ancestry.com takes an interesting approach to the mitigation. Given they have millions of records based on scanned documents, checking each one is beyond reasonable expectations. Given that, they crowd-source corrections. As a customer, I locate a particular record for someone I am looking for, that little data in all the Big Data. If I notice there is some type of error I can flag that record, categorize the error and provide what I believe is the correct information. Ancestry will then look at my correction and, if appropriate, cleanse the transcribed data.
Even though we are discussing genealogy, data pedigree is not about the family tree. Data pedigree is ‘Where did the data comes from?' and ‘How reliable is that source?' If, as an organization, you own the data, that is not an issue. In today's Big Data world, many sources are outside of your direct control (unstructured social media data, crowd-sourced data). For genealogy research, data pedigree has always been an issue and concern. A date of birth is a lot more reliable from a town birth record than from say the census example above, where the information is ‘the age on last birthday, as provided in the interview' (I have seen variations of multiple years from sequential census forms for an individual). In my Ancestry.com example again, as well as source records, Ancestry members can make their research available for online search and sharing. When using others' data (i.e., crowd sourcing research), one must always feel comfortable with the reliability of the source. Ancestry allows you to identify what your source of information was, and can identify multiple sources (for example, I may source data of birth based on both a birth record, a marriage record, and a death record). That information is more reliable than a date of birth with no source cited. When I find a potential match (again, that little data I am truly looking for), I can determine if it truly is a match or possibly a false correlation.
Similarly, in any Big Data implementation, we must understand the pedigree of our data sources. This impacts any analytics we perform and the resulting correlations. If you don't, you run the risk of potentially false correlations and assumptions. For some entertaining examples of false correlations check out www.tylervigen.com.
Finding That Gem of Little Data in the Huge Oceans of Big Data
The ultimate value of Big Data is not the huge ocean of data. It's being able to find the gems of little data that provide the information you seek. In genealogy, it is wonderful that I have millions of public records, documents, and other genealogy research available to sift through, but that's not the value. The value is when I find that record for that one individual in the family tree I have been trying to find. Doing the analysis, and the matching of the data is very dependent on the challenges we have been discussing, data quality and data pedigree. The same is true for any Big Data implementation. Big Data without good understanding of the data is just a big pile of data taking up space.
No technology negates the need for good planning and design. Big Data not just about storing structured and unstructured data. It's not just providing the latest and greatest analytic tools. As technologists we must work with the business plan and design how to leverage and balance the data and its analysis. Work with the business to ensure there is the correct understanding of the data that is available, its quality, its pedigree and the impact of those. Then the true value of the Big Data will shine through as all the gems of little data are found.
The age of Digital Disruption is evolving into the next era – Digital Cohesion, an age in which applications securely self-assemble and deliver predictive services that continuously adapt to user behavior. Information from devices, sensors and applications around us will drive services seamlessly across mobile and fixed devices/infrastructure. This evolution is happening now in software defined services and secure networking. Four key drivers – Performance, Economics, Interoperability and Trust ...
May. 1, 2017 02:15 AM EDT Reads: 1,187
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, discussed the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docker c...
May. 1, 2017 01:15 AM EDT Reads: 1,454
With billions of sensors deployed worldwide, the amount of machine-generated data will soon exceed what our networks can handle. But consumers and businesses will expect seamless experiences and real-time responsiveness. What does this mean for IoT devices and the infrastructure that supports them? More of the data will need to be handled at - or closer to - the devices themselves.
May. 1, 2017 12:30 AM EDT Reads: 1,288
SYS-CON Events announced today that CollabNet, a global leader in enterprise software development, release automation and DevOps solutions, will be a Bronze Sponsor of SYS-CON's 20th International Cloud Expo®, taking place from June 6-8, 2017, at the Javits Center in New York City, NY. CollabNet offers a broad range of solutions with the mission of helping modern organizations deliver quality software at speed. The company’s latest innovation, the DevOps Lifecycle Manager (DLM), supports Value S...
May. 1, 2017 12:15 AM EDT Reads: 1,553
Multiple data types are pouring into IoT deployments. Data is coming in small packages as well as enormous files and data streams of many sizes. Widespread use of mobile devices adds to the total. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists will look at the tools and environments that are being put to use in IoT deployments, as well as the team skills a modern enterprise IT shop needs to keep things running, get a handle on all this data, and deli...
Apr. 30, 2017 11:45 PM EDT Reads: 2,863
Automation is enabling enterprises to design, deploy, and manage more complex, hybrid cloud environments. Yet the people who manage these environments must be trained in and understanding these environments better than ever before. A new era of analytics and cognitive computing is adding intelligence, but also more complexity, to these cloud environments. How smart is your cloud? How smart should it be? In this power panel at 20th Cloud Expo, moderated by Conference Chair Roger Strukhoff, pane...
Apr. 30, 2017 10:30 PM EDT Reads: 2,651
Grape Up is a software company, specialized in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market across the USA and Europe, we work with a variety of customers from emerging startups to Fortune 1000 companies.
Apr. 30, 2017 10:00 PM EDT Reads: 2,687
SYS-CON Events announced today that Grape Up will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company specializing in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market across the U.S. and Europe, Grape Up works with a variety of customers from emergi...
Apr. 30, 2017 09:45 PM EDT Reads: 2,529
Financial Technology has become a topic of intense interest throughout the cloud developer and enterprise IT communities. Accordingly, attendees at the upcoming 20th Cloud Expo at the Javits Center in New York, June 6-8, 2017, will find fresh new content in a new track called FinTech.
Apr. 30, 2017 09:45 PM EDT Reads: 2,726
@ThingsExpo has been named the Most Influential ‘Smart Cities - IIoT' Account and @BigDataExpo has been named fourteenth by Right Relevance (RR), which provides curated information and intelligence on approximately 50,000 topics. In addition, Right Relevance provides an Insights offering that combines the above Topics and Influencers information with real time conversations to provide actionable intelligence with visualizations to enable decision making. The Insights service is applicable to eve...
Apr. 30, 2017 09:30 PM EDT Reads: 3,220
SYS-CON Events announced today that Interoute, owner-operator of one of Europe's largest networks and a global cloud services platform, has been named “Bronze Sponsor” of SYS-CON's 20th Cloud Expo, which will take place on June 6-8, 2017 at the Javits Center in New York, New York. Interoute is the owner-operator of one of Europe's largest networks and a global cloud services platform which encompasses 12 data centers, 14 virtual data centers and 31 colocation centers, with connections to 195 add...
Apr. 30, 2017 09:15 PM EDT Reads: 2,358
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
Apr. 30, 2017 08:30 PM EDT Reads: 1,462
20th Cloud Expo, taking place June 6-8, 2017, at the Javits Center in New York City, NY, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy.
Apr. 30, 2017 08:15 PM EDT Reads: 3,577
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place June 6-8, 2017, at the Javits Center in New York City, New York, is co-located with 20th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry p...
Apr. 30, 2017 08:00 PM EDT Reads: 1,775
Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like “How is my application doing” but no id...
Apr. 30, 2017 07:45 PM EDT Reads: 7,482
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend @CloudExpo | @ThingsExpo, June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA. Learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
Apr. 30, 2017 07:30 PM EDT Reads: 1,803
@GonzalezCarmen has been ranked the Number One Influencer and @ThingsExpo has been named the Number One Brand in the “M2M 2016: Top 100 Influencers and Brands” by Analytic. Onalytica analyzed tweets over the last 6 months mentioning the keywords M2M OR “Machine to Machine.” They then identified the top 100 most influential brands and individuals leading the discussion on Twitter.
Apr. 30, 2017 07:15 PM EDT Reads: 1,645
[session] The IoT Evolution Will Be Led by Novel ‘Abstraction’ By @InteractorTeam | @ThingsExpo #IoT #M2M
Most technology leaders, contemporary and from the hardware era, are reshaping their businesses to do software in the hope of capturing value in IoT. Although IoT is relatively new in the market, it has already gone through many promotional terms such as IoE, IoX, SDX, Edge/Fog, Mist Compute, etc. Ultimately, irrespective of the name, it is about deriving value from independent software assets participating in an ecosystem as one comprehensive solution.
Apr. 30, 2017 06:45 PM EDT Reads: 799
SYS-CON Events announced today that T-Mobile will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. As America's Un-carrier, T-Mobile US, Inc., is redefining the way consumers and businesses buy wireless services through leading product and service innovation. The Company's advanced nationwide 4G LTE network delivers outstanding wireless experiences to 67.4 million customers who are unwilling to compromise on ...
Apr. 30, 2017 06:15 PM EDT Reads: 1,677
The 20th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held June 6-8, 2017, at the Javits Center in New York City, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Containers, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal ...
Apr. 30, 2017 05:30 PM EDT Reads: 1,855