A Tale of the SRG: Math Wins the Day
As this year draws to a close, I find myself wondering if some of the ideas and methods that have been used to create vaccines and to push back against COVID-19 will find broader applicability in the years ahead, perhaps in areas reaching well beyond health care.
Capital Thinking • Issue #760 • View online
During World War II, a Statistical Research Group was formed to assist the war effort.
W. Allen Wallis, who was Director of Research, tells the story in in “The Statistical Research Group, 1942-1945” (Journal of the American Statistical Association, 75:370, June 1980, pp. 320-330, available vis JSTOR): “The Statistical Research Group (SRG) was based at Columbia University during the Second World War and supported by the Applied Mathematics Panel (AMP) of the National Defense Research Committee (NDRC), which was part of the Office of Scientific Research and Development (OSRD).”
Wallis was Director of Research. Some prominent members of the group included Milton Friedman, Harold Hotelling, Leonard Savage, and Abraham Wald.
Indeed, Wallis writes: “SRG was composed of what surely must be the most extraordinary group of statisticians ever organized, taking into account both number and quality.”
Lessons from World War II Statisticians: Survivorship Bias and Sequential Analysis
The backstory goes like this.
On behalf of himself and some other other Stanford statistics professors, Wallis wrote to the US government government in 1942, offering to help in some way with the war effort.
He got a letter back from W. Edwards Deming, the engineer who later became a guru of industrial quality control, but who at this time was working in the US Census Bureau.
Deming wrote back “with four single- spaced pages on the letterhead of the Chief of Ordnance, War Department,” and suggested that the statisticians prepare a short course for engineers and firms in how statistical methods could be used for quality control.
As Wallis dryly noted in 1980:
“The program that resulted from Deming’s suggestion eventually made a major contribution to the war effort. Its aftermath, in fact, continues to make major contributions not only to the American economy but also to the Japanese economy.”
By mid-1942, Wallis soon ended up moving to Columbia to run the Statistical Research Group.
One bit of back-story is that, in those pre-computer days, “the computing … was done by about 30 young women, mostly mathematics graduates of Hunter or Vassar. Some of the basic statistical tables published in Techniques of Statistical Analysis (SRG 1948) were computed as backlog projects when there was slack in the computing load.”
The SRG carried out literally hundreds of analyses:
How the ammunition in aircraft machine guns should be mixed; quality examination methods for rocket fuel;
“the best settings on proximity fuzes for air bursts of artillery shells against ground troops”;
“to evaluate the comparative effectiveness of four 20 millimeter guns on the one hand and eight 50 caliber guns on the other as the armament of a fighter aircraft”;
calculating “pursuit curves” for targeting missiles and torpedoes.
“Statistical studies were also made of stereoscopic range finders, food storage data, high temperature alloys, the diffusion of mustard gas, and clothing tests.”
Several of the insights from the SRG have had a lasting effect in terms of statistical analysis. Here, I’ll focus on two of them: survivorship bias and sequential sampling.
“Survivorship bias” refers to a problem that emerges when you look at the results of data, not realizing that some data points have dropped out over time.
For example, suppose you look at the average rate of return from stock market mutual funds. If you just look at the universe of current funds, you will be leaving out funds that did badly and were closed or merged for lack of interest.
Or suppose you argue in favor of borrowing money to attend a four-year college by citing evidence about higher salaries earned by college graduates, but you leave out the experience of those who borrowed money and did not end up graduating.
In health care, the issue of survivorship bias can come up quite literally in studies of trauma care: before drawing conclusions, such studies must of course beware of the fact that the data of those who suffered and injury but did not end up in the trauma care unit, or those who died of the injury before arriving at the trauma care unit, will not be included in the study.
In a follow-up comment on the main article, appearing in the same issue, Wallis describes the origins of the idea of survivorship bias:
In the course of reviewing the history of SRG, I was reminded of some ingenious work by Wald that has never seen the light of day. Arrangements have now been made for its publication, although the form and place are yet undecided. Wald wrote a series of memoranda on estimating the vulnerability of various parts of an airplane from data showing the number of hits on the respective parts of planes returning from combat. The vulnerability of a part (engine, aileron, pilot, stabilizer, elevator, etc.) is defined as the probability that a hit on that part will result in destruction of the plane (fire, explosion, loss of power, loss of control, etc.). The military was inclined to provide protection for those parts that on returning planes showed the most hits. Wald assumed, on good evidence, that hits in combat were uniformly distributed over the planes. It follows that hits on the more vulnerable parts were less likely to be found on returning planes than hits on the less vulnerable parts, since planes receiving hits on the more vulnerable parts were less likely to return to provide data. From these premises, he devised methods for estimating the vulnerability of various parts.
In other words, just looking at damage on the planes that returns would not be useful, but when adjusting for the fact that the returning planes are the ones that survived, it can offer real insights.
But clearly the most prominent statistical insight from the SRG was the idea of sequential analysis, which Wallis calls “one of the most powerful and seminal statistical ideas of the past third of a century.”
Photo credit: Enayet Raheem on Unsplash