I think a lot of what is meant by "overhang" is inference. GPT-4 was probably trained by about 10,000 gpus, but it can run on probably about 8 gpus. If an AI can escape to other computers, then constraining training runs without constraining the broader availability of compute would imply a bigger such mismatch (although maybe this misma…
I think a lot of what is meant by "overhang" is inference. GPT-4 was probably trained by about 10,000 gpus, but it can run on probably about 8 gpus. If an AI can escape to other computers, then constraining training runs without constraining the broader availability of compute would imply a bigger such mismatch (although maybe this mismatch isn't very relevant because it's so big already and is growing). (The reason the mismatch is growing is increasing the size of a model by 3 costs 9 times as much (squared), but only increases inference cost by 3)
Yeah, "overhang" has been used to mean 'inference is cheap,' but that usage seems less common recently. See https://www.lesswrong.com/posts/icR53xeAkeuzgzsWP/taboo-compute-overhang. This "mismatch" seems worth noticing because it implies that training run X could cause us to go from _nobody having access to X-level capabilities_ to _anyone with the model weights being able to run lots of X-level inference_, but marginal changes in the "mismatch" don't seem very decision-relevant to me (and cheapness-of-inference in absolute terms seems more important than the mismatch).
I think a lot of what is meant by "overhang" is inference. GPT-4 was probably trained by about 10,000 gpus, but it can run on probably about 8 gpus. If an AI can escape to other computers, then constraining training runs without constraining the broader availability of compute would imply a bigger such mismatch (although maybe this mismatch isn't very relevant because it's so big already and is growing). (The reason the mismatch is growing is increasing the size of a model by 3 costs 9 times as much (squared), but only increases inference cost by 3)
Yeah, "overhang" has been used to mean 'inference is cheap,' but that usage seems less common recently. See https://www.lesswrong.com/posts/icR53xeAkeuzgzsWP/taboo-compute-overhang. This "mismatch" seems worth noticing because it implies that training run X could cause us to go from _nobody having access to X-level capabilities_ to _anyone with the model weights being able to run lots of X-level inference_, but marginal changes in the "mismatch" don't seem very decision-relevant to me (and cheapness-of-inference in absolute terms seems more important than the mismatch).