Discover more from AI Impacts blog
Description vs simulated prediction
By Rick Korzekwa, 22 April 2020
During our investigation into discontinuous progress[note]By ‘discontinuity’, I mean a jump in technological progress that happened at least ten years earlier than it would have taken had the preceding trend held. For example, from 1826 to 1929, the record for the longest suspension bridge span increased from 176 meters to 564 meters, for an average rate of 3.8 meters per year. In 1931, the George Washington Bridge broke the record by more than 500 meters, which would have taken more than a century at 3.8m/year. For more details, see our methodology page and Katja's blog post on the project.[/note], we had some discussions about what exactly it is that we’re trying to do when we analyze discontinuities in technological progress. There was some disagreement and (at least on my part) some confusion about what we were trying to learn from all of this. I think this comes from two separate, more general questions that can come up when forecasting future progress. These questions are closely related and both can be answered in part by analyzing discontinuities. First I will describe these questions generically, then I will explain how they can become confused in the context of analyzing discontinuous progress.
Question 1: How did tech progress happen in the past?
Knowing something about how historical progress happened is crucial to making good forecasts now, both from the standpoint of understanding the underlying mechanisms and establishing base rates. Merely being able to describe what happened and why can help us make sense of what’s happening now and should enable us to make better predictions. Such descriptions vary in scope and detail. For example:
The number of transistors per chip doubled roughly every two years from 1970 to 2020
Wartime funding during WWII led to the rapid development and scaling of penicillin production, which contributed to a 90% decrease in US syphilis mortality from 1945 to 1967
Large jumps in metrics for technological progress were not always accompanied by fundamental scientific breakthroughs
This sort of analysis may be used for other work, in addition to forecasting rates of progress. In the context of our work on discontinuities, answering this question mostly consists of describing quantitatively how metrics for progress evolve over time.
Question 2: How would we have fared making predictions in the past?
This is actually a family of questions aimed at developing and calibrating prediction methods based on historical data. These are questions like:
If, in the past, we’d predicted that the current trend would hold, how often and by how much would we have been wrong?
Are there domains in which we’d have fared better than others?
Are there heuristics we can use to make better predictions?
Which methods for characterizing trends in progress would have performed the best?
How often would we have seen hints that a discontinuity or change in rate of progress was about to happen?
These questions often, but not always, require the same approach
In the context of our work on discontinuous progress, these questions converge on the same methods most of the time. For many of our metrics, there was a clear trend leading up to a discontinuity, and describing what happened is essentially the same as attempting to (naively) predict what would have happened if the discontinuity had not happened. But there are times when they differ. In particular, this can happen when we have different information now than we would have had at the time the discontinuity happened, or when the naive approach is clearly missing something important. Three cases of this that come to mind are:
The trend leading up to the discontinuity was ambiguous, but later data made it less ambiguous. For example, advances in steamships improved times for crossing the Atlantic, but it was not clear whether this progress was exponential or linear at the time that flight or telecommunications were invented. But if we look at progress that occurred for transatlantic ship voyages after flight, we can see that the overall trend was linear. If we want to answer the question "What happened?", we might say that progress in steamships was linear, so that it would have taken 500 years at the rate of advancement for steamships to bring crossing time down to that of the first transatlantic flight. If we want to answer the question “How much would this discontinuity have affected our forecasts at the time?”, we might say that it looked exponential, so that our forecasts would have been wrong by a substantially shorter amount of time.
We now have access to information from before the discontinuity that nobody (or no one person) had access to at the time. In the past, the world was much less connected, and it is not clear who knew about what at the time. For example, building heights, altitude records, bridge spans, and military capabilities all showed progress across different parts of the world, and it seems likely that nobody had access to all the information that we have now, so that forecasting may have been much harder or yielded different results. Information that is actively kept secret may have made this problem worse. It seems plausible that nobody knew the state of both the Manhattan Project and the German nuclear weapons program at the time that the first nuclear weapon was tested in 1945.
The inside view overwhelms the outside view. For example, the second transatlantic telegraph cable was much, much better than the first. Using our methodology, it was nearly as large an advancement over the first cable as the first cable was over mailing by ship. But we lose a lot by viewing these advances only in terms of their deviation from the previous trend. The first cable had extremely poor performance, while the second performed about as well as a typical high performance telegraph cable did at the time. If we were trying to predict future progress at the time, we’d focus on questions like “How long do we think it will take to get a normal cable working?” or “How soon will it be until someone is willing to fund the next cable laying expedition?”, not “If we draw a line through the points on this graph, where does that take us?” (Though that outside view may still be worth consideration.) However, if we’re just trying to describe how the metric evolved over time, then the correct thing to do is to just draw a line through the points as best we can and calculate how far off-trend the new advancement is.
Reasons for focusing more on the descriptive approach for now
Both of these questions are important, and we can’t really answer one while totally ignoring the other. But for now, we have focused more on describing what happened (that is, answering question 1).
There are several reasons for this, but first I’ll describe some advantages to focusing on simulating historical predictions:
It mitigates what may be some misleading results from the descriptive approach. See, for example, the description of the transatlantic telegraph above.
We’re trying to do forecasting (or enable others to do it), and really good answers to these questions might be more valuable.
But, for now, I think the advantages to focusing on description are greater:
The results are more readily reusable for other projects, either by us or by others. For example, answering a question like “How much of an improvement is a typical major advancement over previous technology?”
It does not require us to model a hypothetical forecaster. It’s hard to predict what we (or someone else) would have predicted if asked about future progress in weapons technology just before the invention of nuclear weapons. To me, this process feels like it has a lot of moving parts or at least a lot of subjectivity, which leaves room for error, and makes it harder for other people to evaluate our methods.
It is easier to build from question 1 to question 2 than the other way around. A description of what happened is a pretty reasonable starting point for figuring out which forecasting methods would have worked.
It is easier to compare across technologies using question 1. Question 2 requires taking a more inside view, which makes comparisons harder.