Personal Growth Featured

What I learned from watching 'Intro to Large Language Models' by Andrej Karpathy

Vidu Glöck

Jan 11, 2024 • 4 min read

I'm currently delving into how generative AI works. In my research, I stumbled upon an amazing 1-hour 'Intro to Large Language Models' talk by Andrej Karpathy. He's an AI expert who led AI and Autopilot efforts at Tesla and now works at OpenAI. The talk is easy to follow, even for those without a tech background. Watching it is highly recommended, but here's what I learned:

LLMs are a Lossy Compression of the Internet

LLMs are trained on lots of random internet data, but they only keep important bits during training. They basically hold onto just the outline (shape) of the data and get rid of the rest. Andrej says they use this outline to 'dream up' answers. This is why LLMs can sometimes be wrong and can't give sources for the info they generate.

LLMs Use Tools to Find Answers

I mentioned earlier that LLMs can't give sources for the information they generate. But ChatGPT4 and some other LLMs can link to sources for some questions. This is because in these cases their responses are not from their training data, but from using search engines like Bing to find and summarize information. Also, LLMs are not great at math, but they can figure out when a user wants a math problem solved and then use a calculator for the answer. It's likely that LLMs in the future will not only use tools like search engines and calculators but also other LLMs as tools.

Training LLMs Happens in Three Steps

Step 1: Pre-training

Here, the model learns from a huge amount of internet data. The more data, the better (we're talking tens to hundreds of Terabytes). This part is costly because it needs lots of computing power and those hard-to-get Nvidia GPUs. It's usually done once a year. After this, the model can generate 'internet documents' like Wikipedia articles, Amazon pages, and code.

Step 2: Fine-tuning

Now, the goal is to make a chat model. Humans create lots of Q&A pairs (~100k) for the model to learn from. This part uses less data but it's high quality. It's cheaper, so it's done more often, like weekly or even daily. After this, the model can chat in Q&A style, while still having access to its initial training data.

Step 3: Learning from Human Feedback

In this step, the model comes up with different answers to a question, and humans pick the best ones. These choices help make the model better. At OpenAI this process is called 'Reinforcement learning based on human feedback'. Andrej says this step is optional and doesn't mention it in the video, but I think this step was key in making GPT-3 so good that OpenAI launched ChatGPT and kicked off the Generative AI wave.

We Don't Really Know How Large Language Models Work

Here's the tricky part: Researchers can make models better by iteratively tweaking the model's parameters, but they don't really know how it all works inside. It's a black box. This is why it is so hard to solve problems like the 'reversal curse' that LLMs have. Andrej explains this using the following example: You could ask GPT 4 'Who is Tom Cruise's mother?' and it would respond correctly 'Mary Lee Pfeiffer.' But if you ask 'Who is Mary Lee Pfeiffer's son?', it would respond incorrectly. There is an entire field called mechanistic interpretability, in which researchers try to make sense of the LLM black box.

Bigger Models Guarantee Better Predictions

This explains why there's such a rush for GPUs and AI startups: Andrej says the quality of a model's predictions is guaranteed to become better if you use more data or make the model bigger (by increasing the number of parameters). The bigger the model and data, the more computing you need. That's why everyone wants more GPUs in order to be able to train their ever-growing models faster. It's like investing in a hedge fund that guarantees a high yield return and doesn't have a limit on how much money you can invest.

Future LLMs Will Self-Improve in Narrow Domains

Here's something cool: In machine learning, there's this idea of 'reward functions' that score the output of the model. It's hard to create these for general stuff like writing prose, but easier for specific things with clear rules, like chess or coding. There's research going on to help LLMs get better by themselves in these specific areas, by using reward functions for feedback iteratively.

Conclusion

After watching the video, I understand LLMs better and feel better equipped to handle all the buzz around Generative AI. Here's how I see the future of LLMs after watching this:

They might not be able to always generate factual correct answers any time soon, but they can search the web and summarize citable information for you.
Only big companies or well-funded startups can afford the first stage of model training. If more companies share their models like Meta does, we could see lots of AI startups creating a diverse array of models in the future. If proprietary models remain the norm, only a handful of big tech companies might control the AI models in the future.
Getting data could get tougher. Rights owners want to be paid for their data being used in training or when being accessed (during the usage of the model). This could mean lawsuits (like NY Times suing OpenAI & Microsoft) or deals (like Axel Springer's agreement with OpenAI). This could make it harder for AI companies to get new training data. On the other hand, it could also be a new way for content creators to earn money, other than ads or subscriptions.

What are your thoughts on LLMs and their future? Let me know at feedback@vidugloeck.com. I'm here to learn.