All New Mistakes

All New Mistakes

Home
Archive
About
Trading Compute for Memory: Using Activation Recomputation with GPT2
One of the most obvious patterns that we can notice when running training on transformers is that activations take up the most memory:
Jun 20 • 
Ganesh Ravichandran
1

Share this post

All New Mistakes
All New Mistakes
Trading Compute for Memory: Using Activation Recomputation with GPT2
Building Deep Learning Intuition on a consumer GPU: Memory Profiling GPT-2 Training
The Craigslist post
Jun 11 • 
Ganesh Ravichandran
1

Share this post

All New Mistakes
All New Mistakes
Building Deep Learning Intuition on a consumer GPU: Memory Profiling GPT-2 Training
© 2025 Ganesh Ravichandran
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share