All New Mistakes
Subscribe
Sign in
Home
Archive
About
Trading Compute for Memory: Using Activation Recomputation with GPT2
One of the most obvious patterns that we can notice when running training on transformers is that activations take up the most memory:
Jun 20
•
Ganesh Ravichandran
1
Share this post
All New Mistakes
Trading Compute for Memory: Using Activation Recomputation with GPT2
Copy link
Facebook
Email
Notes
More
Building Deep Learning Intuition on a consumer GPU: Memory Profiling GPT-2 Training
The Craigslist post
Jun 11
•
Ganesh Ravichandran
1
Share this post
All New Mistakes
Building Deep Learning Intuition on a consumer GPU: Memory Profiling GPT-2 Training
Copy link
Facebook
Email
Notes
More
Share
Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts