Sep 15, 2023·edited Sep 15, 2023

I think a lot of what is meant by "overhang" is inference. GPT-4 was probably trained by about 10,000 gpus, but it can run on probably about 8 gpus. If an AI can escape to other computers, then constraining training runs without constraining the broader availability of compute would imply a bigger such mismatch (although maybe this mismatch isn't very relevant because it's so big already and is growing). (The reason the mismatch is growing is increasing the size of a model by 3 costs 9 times as much (squared), but only increases inference cost by 3)

Expand full comment

Yeah, "overhang" has been used to mean 'inference is cheap,' but that usage seems less common recently. See https://www.lesswrong.com/posts/icR53xeAkeuzgzsWP/taboo-compute-overhang. This "mismatch" seems worth noticing because it implies that training run X could cause us to go from _nobody having access to X-level capabilities_ to _anyone with the model weights being able to run lots of X-level inference_, but marginal changes in the "mismatch" don't seem very decision-relevant to me (and cheapness-of-inference in absolute terms seems more important than the mismatch).

Expand full comment