Up next

Autoplay

How to Build an LLM from Scratch | An Overview

0 Views ā€¢ 11/04/24
Share
Embed
121gamers
121gamers
14 Subscribers
14

Want to learn more? Iā€™m launching a 6-week live BootCamp for AI Builders.
šŸ‘‰ Learn more: https://maven.com/s/course/13437a45a7
Save 50% at checkout with the code FOUNDER50

This is the 6th video in a series on using large language models (LLMs) in practice. Here, I review key aspects of developing a foundation LLM based on the development of models such as GPT-3, Llama, Falcon, and beyond.

More Resources:
ā–¶ļø Series Playlist: https://www.youtube.com/playli....st?list=PLz-ep5RbHos Read more: https://towardsdatascience.com..../how-to-build-an-llm

[1] BloombergGPT: https://arxiv.org/pdf/2303.17564.pdf
[2] Llama 2: https://ai.meta.com/research/p....ublications/llama-2-
[3] LLM Energy Costs: https://www.statista.com/stati....stics/1384401/energy
[4] arXiv:2005.14165 [cs.CL]
[5] Falcon 180b Blog: https://huggingface.co/blog/falcon-180b
[6] arXiv:2101.00027 [cs.CL]
[7] Alpaca Repo: https://github.com/gururise/AlpacaDataCleaned
[8] arXiv:2303.18223 [cs.CL]
[9] arXiv:2112.11446 [cs.CL]
[10] arXiv:1508.07909 [cs.CL]
[11] SentencePience: https://github.com/google/sent....encepiece/tree/maste
[12] Tokenizers Doc: https://huggingface.co/docs/tokenizers/quicktour
[13] arXiv:1706.03762 [cs.CL]
[14] Andrej Karpathy Lecture: https://www.youtube.com/watch?v=kCc8FmEb1nY&t=5307s
[15] Hugging Face NLP Course: https://huggingface.co/learn/n....lp-course/chapter1/7
[16] arXiv:1810.04805 [cs.CL]
[17] arXiv:1910.13461 [cs.CL]
[18] arXiv:1603.05027 [cs.CV]
[19] arXiv:1607.06450 [stat.ML]
[20] arXiv:1803.02155 [cs.CL]
[21] arXiv:2203.15556 [cs.CL]
[22] Trained with Mixed Precision Nvidia: https://docs.nvidia.com/deeple....arning/performance/m
[23] DeepSpeed Doc: https://www.deepspeed.ai/training/
[24] https://paperswithcode.com/method/weight-decay
[25] https://towardsdatascience.com..../what-is-gradient-cl
[26] arXiv:2001.08361 [cs.LG]
[27] arXiv:1803.05457 [cs.AI]
[28] arXiv:1905.07830 [cs.CL]
[29] arXiv:2009.03300 [cs.CY]
[30] arXiv:2109.07958 [cs.CL]
[31] https://huggingface.co/blog/ev....aluating-mmlu-leader
[32] https://www.cs.toronto.edu/~hi....nton/absps/JMLRdropo

--
Homepage: https://shawhintalebi.com/
Book a call: https://calendly.com/shawhintalebi

Intro - 0:00
How much does it cost? - 1:30
4 Key Steps - 3:55
Step 1: Data Curation - 4:19
1.1: Data Sources - 5:31
1.2: Data Diversity - 7:45
1.3: Data Preparation - 9:06
Step 2: Model Architecture (Transformers) - 13:17
2.1: 3 Types of Transformers - 15:13
2.2: Other Design Choices - 18:27
2.3: How big do I make it? - 22:45
Step 3: Training at Scale - 24:20
3.1: Training Stability - 26:52
3.2: Hyperparameters - 28:06
Step 4: Evaluation - 29:14
4.1: Multiple-choice Tasks - 30:22
4.2: Open-ended Tasks - 32:59
What's next? - 34:31

Show more
0 Comments sort Sort By
Facebook Comments

Up next

Autoplay