
Edit: Since writing this I have watched the Video - State of GPT | BRK216HFS by Andrej Karpathy(Open AI). In 42 minutes, it not only covers the basic high-level details on how auto-regressive decoder only models work (e.g.; GPT models) but has the clearest explanation on why prompt engineering works.