ML Q&A

Fundamental Concepts

What are precision, recall, and F1 score?
What is an ROC curve, and how is it used?
What is AUC and how is it interpreted?
What is log loss / cross-entropy loss?
What are confusion matrices and how do you interpret them?
What is the difference between micro, macro, and weighted averaging in classification metrics?
How do you evaluate a model for multi-class vs multi-label classification?
What metrics are suitable for ranking models?

How does gradient descent work?
What is stochastic gradient descent, and how is it different from full-batch gradient descent?
What is backward propagation, and how does it relate to neural networks?
What are common variants of gradient descent (SGD, Momentum, Adam, RMSProp)?
What are vanishing and exploding gradients?
How does learning rate affect training?
What is gradient clipping and why is it used?
How do optimizers differ in convergence speed and stability?

What is the ReLU activation function?
How to deal with gradient vanishing?
What are common activation functions and when to use them (ReLU, sigmoid, tanh, GELU, etc.)?
What are fully connected layers?
What is the role of initialization in deep learning?
What is dropout and how does it prevent overfitting?
What is an epoch, batch, and iteration?
What is early stopping and how does it help generalization?
What is the difference between feedforward networks and recurrent networks?

andrewekhalel / MLQuestions
A curated list of ML interview questions categorized by topic, with some helpful explanations.
khangich / machine-learning-interview
Interview prep notebook covering theory and code implementations for ML algorithms.

Multicollinearity in Regression Analysis – Statistics by Jim
Explains the impact of multicollinearity on model interpretability and stability.
Feature Scaling Overview – Sebastian Raschka
Covers different scaling techniques and their effect on ML models.
Gradient Boosting vs Random Forest
An intuitive side-by-side comparison with examples.
L1 vs L2 Regularization (Visual Explanation)
Illustrates geometric differences and impact on sparsity.
PCA Explained
A step-by-step walkthrough of PCA with visuals.
Logistic Regression Detailed Overview
Covers math, intuition, and loss functions.

K-Means Intuition (Stanford CS221)
Visual and interactive explanation of the K-Means algorithm.

Cross-Attention in Transformers
Cross Attention Explanation.
BatchNorm vs LayerNorm
Explains when and why to use each normalization technique.
Bagging vs Boosting
Good summary for tree-based ensemble learners.

Attention Is All You Need – Vaswani et al.
The original paper that introduced the Transformer architecture.
CLIP: Learning Transferable Visual Models From Natural Language
CLIP Paper (OpenAI) – Vision-language model combining image and text representations.

Deep Learning Specialization (Coursera by Andrew Ng)
A five-course series covering neural networks, CNNs, RNNs, optimization, and structuring deep learning projects. Ideal for building strong foundational intuition.
Generative AI with LLMs (DeepLearning.AI)
Covers modern LLM workflows, from prompt engineering to RAG and fine-tuning.