AI Courses

Here is a treasure trove of high-quality publicly available AI courses with lecture videos and assignments. I've completed these courses and posted my solutions on my GitHub.

Stanford CS231n: Deep Learning for Computer Vision

This course, while taught through the lens of computer vision, is a surprisingly great general introduction to deep learning. I think what makes it so great is that Andrej Karpathy originally designed the course and was the primary instructor. Unlike other deep learning courses, CS231n is very approachable. It's not bogged down by heavy theory and only minimal prerequisites are involved: basic differential calculus, some linear algebra and probability theory.

There is an emphasis on understanding backpropagation through computational graphs which I really like as it (1) gives strong intuitions on how the gradient "flows" through the neural net which is useful in practice and (2) uncovers the core function of deep learning frameworks: autograd. Even the assignments require you to manually backprop through several layers in the neural network which I found to be an illuminating exercise.

You may want to supplement this course with Karpathy's excellent NN Zero to Hero series (there is quite a bit of overlap). Also, looks like they've released lecture videos for the 2025 version of the course which is more comprehensive than the original version. Particularly it covers transformers and generative models like diffusion.

Stanford CS224n: Natural Language Processing with Deep Learning

This course starts from word vectors and builds up to transformers. You really get to see the evolution of ideas in NLP. The downside being that there is quite a bit of historical content in this course. For example, it seems that they still teach dependency parsing; and I seriously doubt how useful this topic is today given the dominance of LLMs. So, for most people I would instead recommend Stanford CS336: Language Modeling from Scratch. However, if you're mainly interested in research, CS224n does offer better theory. I found this course to be fairly easy if you've already done an introductory deep learning course like the one above.

CMU Deep Learning Systems

In this course you'll build a mini PyTorch. It especially deepened my understanding of how deep learning frameworks like PyTorch work under the hood.

In the assignments, you first implement autograd, the core function of deep learning frameworks. On top of this, you build an entire neural network library that is quite comprehensive (various NN layers, initialization schemes, optimizers, data primitives) and train neural networks using it. And if this wasn't enough, you also build out an 'NDArray' library to effectively replace NumPy.

This course does require quite a bit of work, but it's not too difficult for anyone with a decent background in CS and some deep learning experience. In the end, it is quite rewarding to see neural nets being trained off the deep learning framework you built.

Berkeley Deep Unsupervised Learning

Taught by Pieter Abbeel, in this course you learn about all the various flavours of generative models out there (autoregressive, flow, VAEs, GANs and diffusion). Before starting I would recommend a strong foundation on probability theory as topics like VAEs and diffusion are especially math heavy. I found these topics difficult to grasp on first watch, so you'll likely need to read other resources online. Having at least some deep learning experience (like the previous courses mentioned) is recommended as well.

There are four assignments and each one can take up to 40 hours of work (their words not mine). You're required to implement generative models basically from scratch in your preferred deep learning framework (e.g, PyTorch or JAX). It's quite tricky to code up the model and algorithm correctly on the first pass so you may need to spend a non-trivial amount of time debugging your model. Unlike regular software, you won't see any compile errors; only that your loss does not converge (and to make matters worse, even if your loss does converge, you may still have hidden bugs that ever so slightly affect performance. But hopefully this doesn't happen too often to you). While this was the most difficult course I've ever completed, it was also the course I had most grown from.

MIT Accelerated Computing

I am confident in saying that this is by far the best CUDA course out there (not like there are many out there to begin with though). This course not only teaches you how to write high performance code on GPUs, but also dives into the architectural details and how it manifests in the code you write.

The 8 labs are like mini-lectures themselves. Labs 1-3 are all about parallel computing and GPU programming basics. They are especially useful to compare and contrast the CPU and GPU. You learn that CPUs and GPUs are partially alike and live on a trade-off space. I also found the first three lectures of Stanford CS149: Parallel Computing to be quite useful for understanding this part of the course. Labs 4-6 cover matrix multiply optimization (tiling, pipelining, avoiding bank conflicts, tensor cores) and labs 7-8 cover irregular workloads and their useful primitives (scan and stream compaction).

The 2025 edition of the course, which is rolling out at the time of writing this, contains publicly available lecture videos and slides, whereas the fully released 2024 edition does not. Also, since I don't own an NVIDIA GPU, I used Google Colab and RTX A4000 instances on Vast.ai (which are quite cheap if you don't forget to turn off your instance 🙂) in order to compile and run my CUDA code.

MIT EfficientML

In this course you learn about methods that make neural nets much more efficient to deploy with little to no performance degradation. Topics include pruning, quantization, neural architecture search & knowledge distillation. Although the labs feel quite "fill in the blank" and the lecture videos sometimes feel too much like a survey of all relevant research, it is the only publicly available course that covers these topics in any meaningful depth (I believe).

Extras