Welcome!

@DXWorldExpo Authors: Yeshim Deniz, Zakia Bouachraoui, Liz McMillan, Pat Romanski, Elizabeth White

Blog Feed Post

There's no mistake in the barley data

Statistics has many canonical data sets. For classification statistics, we have the Fisher's iris data. For Big Data statistics, the canonical data set used in many examples is the Airlines data. And for dotplots, we have the barley data, first popularized by Bill Cleveland in the landmark 1993 text Visualizing Data. Cleveland's innovations in data visualiation were hugely influential in the S language and (later) R's lattice and ggplot2 packages, and the panel chart of the barley data shown below is one of the best known.  The chart above shows the yields for several different varieties of barley (Trebi, Glabron and so on) planted at each of six different sites in Minnesota (Duluth, Grand Rapids, etc.) in the years 1931 (pink) and 1932 (blue). The reason this data set has become legendary appears in the "Morris" panel, where unlike all other sites the yields in 1931 exceeded those in 1932 for all barley varieties. This is a great demonstration of the power of dotplots and panel graphics. In his book, Cleveland said that "either an extraordinary natural event, such as disease or a local weather anomaly, produced a strange coincidence, or the years for Morris were inadvertently reversed", and "on the basis of the evidence, the mistake hypothesis would appear to be the more likely." But it now looks that despite Cleveland's suggestion, the data are correct after all. In a paper in the American Statistician published last year, Kevin Wright notes that in that time period local effects of weather (especially drought), insects and disease had greater impact on barley yields than any overall year-to-year effects on yield, and that the results at Morris were not surprising. Kevin offers as evidence extended barley yield data (available in his R package agridat) covering 10 years and 18 varieties. As you can see in the chart below, there is significant variation across years and within sites. Take a look at 1934 for example: a bounty of barley in Duluth, but a meagre crop in St Paul: So it goes to show that in Cleveland's original example, it wasn't a data error that led to the "unusual" results at the Morris site. Rather, it's an expected consequence of the year-to-year variation of yields in each of the growing sites. But it's no less of an interesting data set to show off the power of dot plots and panel charts — as you can see from several other examples included in Kevin Wright's paper linked below. (With thanks to Kevin for describing this example to me at the useR! 2014 poster session. You can see a version of his poster here.) American Statistician: Revisiting Immer's Barley Data. The American Statistician, 67(3), 129–133.

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

DXWorldEXPO Digital Transformation Stories
Nicolas Fierro is CEO of MIMIR Blockchain Solutions. He is a programmer, technologist, and operations dev who has worked with Ethereum and blockchain since 2014. His knowledge in blockchain dates to when he performed dev ops services to the Ethereum Foundation as one the privileged few developers to work with the original core team in Switzerland.
DXWorldEXPO LLC announced today that Nutanix has been named "Platinum Sponsor" of CloudEXPO | DevOpsSUMMIT | DXWorldEXPO New York, which will take place November 12-13, 2018 in New York City. Nutanix makes infrastructure invisible, elevating IT to focus on the applications and services that power their business. The Nutanix Enterprise Cloud Platform blends web-scale engineering and consumer-grade design to natively converge server, storage, virtualization and networking into a resilient, softwar...
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
When building large, cloud-based applications that operate at a high scale, it’s important to maintain a high availability and resilience to failures. In order to do that, you must be tolerant of failures, even in light of failures in other areas of your application. “Fly two mistakes high” is an old adage in the radio control airplane hobby. It means, fly high enough so that if you make a mistake, you can continue flying with room to still make mistakes. In his session at 18th Cloud Expo, Lee A...
Digital transformation is about embracing digital technologies into a company's culture to better connect with its customers, automate processes, create better tools, enter new markets, etc. Such a transformation requires continuous orchestration across teams and an environment based on open collaboration and daily experiments. In his session at 21st Cloud Expo, Alex Casalboni, Technical (Cloud) Evangelist at Cloud Academy, explored and discussed the most urgent unsolved challenges to achieve fu...
Daniel Jones is CTO of EngineerBetter, helping enterprises deliver value faster. Previously he was an IT consultant, indie video games developer, head of web development in the finance sector, and an award-winning martial artist. Continuous Delivery makes it possible to exploit findings of cognitive psychology and neuroscience to increase the productivity and happiness of our teams.
Only Adobe gives everyone - from emerging artists to global brands - everything they need to design and deliver exceptional digital experiences. Adobe Systems Incorporated develops, markets, and supports computer software products and technologies. The Company's products allow users to express and use information across all print and electronic media. The Company's Digital Media segment provides tools and solutions that enable individuals, small and medium businesses and enterprises to cre...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...
ICC is a computer systems integrator and server manufacturing company focused on developing products and product appliances to meet a wide range of computational needs for many industries. Their solutions provide benefits across many environments, such as datacenter deployment, HPC, workstations, storage networks and standalone server installations. ICC has been in business for over 23 years and their phenomenal range of clients include multinational corporations, universities, and small busines...