People are talking about the new Llama 3.3 70b release, which has generally better performance than Llama 3.1 (approaching 3.1’s 405b performance): https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_3
However, something to note:
Llama 3.3 70B is provided only as an instruction-tuned model; a pretrained version is not available.
Is this the end of open-weight pretrained models from Meta, or is Llama 3.3 70b instruct just a better-instruction-tuned version of a 3.1 pretrained model?
Comparing the model cards: 3.1: https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md 3.3: https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md
The same knowledge cutoff, same amount of training data, and same training time give me hope that it’s just a better finetune of maybe Llama 3.1 405b.
This is making me realize that I don’t fully understand the relationship between “instruction-tuned” and “pre-trained”. I thought instruction tuning was a form of fine-tuning, and that fine-tuning comes after the primary training of the model.
A base-model / pre-trained is fed with a large dataset of random text files. Books, Wikipedia etc. After that the model can autocomplete text. And it has learned language and concepts about the world. But it won’t answer your questions. It’ll refine them, or think you’re writing an email or long list of unanswered questions and write some more questions underneath, instead of engaging with you. Or think it’s writing a novel and autocomplete “…that’s what character asked while rolling their eyes.” Or something completely arbitrary like that.
After that major first step it’ll get fine-tuned to some task. The procedure is the same, it’ll get fed different text in almost the same way. And this just continues the training. But now it’s text that tunes it to it’s role. For example be a Chatbot. It’ll get lots of text that is a question, then a special character/token and then an answer to the question. And it’ll learn to reply with an (correct) answer if you put in a question and that token. It’ll probably also be fine-tuned to write dialogue as a Chatbot. And follow instructions. (And refuse some things and speak more unbiased, be nice…)
You can also put in domain-specific data, make it learn/focus on medicine… I think that’s also called fine-tuning. But as far as I understand teaching knowledge with arbitrary data comes before teaching/tuning it to follow instructions, or it might forget that.
I think instruction tuning is a form of fine-tuning. It’s just called that to distinguish it from other forms of fine-tuning. But I’m not really an expert on any of this.
I was also not sure what this meant, so I asked Google’s Gemini, and I think this clears it up for me:
This means that the creators of Llama 3.3 have chosen to release only the version of the model that has been fine-tuned for following instructions. They are not making the original, “pretrained” version available.
Here’s a breakdown of why this is significant:
- Pretrained models: These are large language models (LLMs) trained on a massive dataset of text and code. They have learned to predict the next word in a sequence, and in doing so, have developed a broad understanding of language and a wide range of general knowledge. However, they may not be very good at following instructions or performing specific tasks.
- Instruction-tuned models: These models are further trained on a dataset of instructions and desired outputs. This fine-tuning process teaches them to follow instructions more effectively, generate more relevant and helpful responses, and perform specific tasks with greater accuracy.
In the case of Llama 3.3 70B, you only have access to the model that has already been optimized for following instructions and engaging in dialogue. You cannot access the initial pretrained model that was used as the foundation for this instruction-tuned version.
Possible reasons why Meta (the creators of Llama) might have made this decision:
- Focus on specific use cases: By releasing only the instruction-tuned model, Meta might be encouraging developers to use Llama 3.3 for assistant-like chat applications and other tasks where following instructions is crucial.
- Competitive advantage: The pretrained model might be considered more valuable intellectual property, and Meta may want to keep it private to maintain a competitive advantage.
- Safety and responsibility: Releasing the pretrained model could potentially lead to its misuse for generating harmful or misleading content. By releasing only the instruction-tuned version, Meta might be trying to mitigate these risks.
Ultimately, the decision to release only the instruction-tuned model reflects Meta’s strategic goals for Llama 3.3 and their approach to responsible AI development.
On Huggingface, someone said it’s still the same base model: https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/discussions/10
And I remember watching some interview with Zuckerberg this year, where he said releasing the models to the public, including base models, is what he wants and part of their strategy.
Thank you so much, that exactly answers my question with the official response (that guy works at Meta) that confirms it’s the same base model!
I was concerned primarily because in the release notes it strangely didn’t mention it anywhere, and I thought it would have been important enough to mention.