🤖 Machine Learning — A Hands-On, Beginner-Friendly Collection

A tour through classical machine learning — implemented and explained from the ground up. Each notebook tackles a real dataset, and this README walks you through what each method actually does, in plain language.

Author: Mohammad Javad Maheronnaghsh

📘 New to ML? Every method below starts with a "💡 In plain English" box — no formulas, no jargon — so you can build intuition before touching the code.

🧭 The big picture

Machine learning is about learning patterns from data instead of being explicitly programmed. The two main families:

Supervised learning — you have labeled examples (input → correct answer) and the model learns to predict the answer for new inputs. Splits into:
- Regression → predict a number (e.g. house price).
- Classification → predict a category (e.g. spam / not-spam).
Unsupervised learning — no labels; the model finds structure on its own (e.g. grouping similar items, compressing data).

Folder	Methods	Type
`Regression/`	Linear · Lasso/Ridge · Polynomial & Splines	Supervised (number)
`Classification/`	Logistic Regression · Decision Tree · Bagging · AdaBoost · SVM · Neural Nets	Supervised (category)
`Classification/` (PCA, K-Means)	PCA · K-Means	Unsupervised
`LDA and QDA/`	Linear & Quadratic Discriminant Analysis	Supervised (category)
`NLP/`	Naive Bayes text classification	Supervised (text)

📚 The classic textbook An Introduction to Statistical Learning (ISLR) is included as a reference.

📈 Regression — predicting a number

💡 In plain English: Regression draws the "best-fit" relationship between inputs and a numeric output, so you can predict that output for new inputs. Linear regression fits a straight line; if the line bends, you need a curve.

Insurance data — how charges relate to age. Regression learns the trend through points like these.

Linear Regression — the foundational straight-line fit.
Lasso & Ridge — linear regression with regularization: a penalty that keeps the model simple to avoid overfitting (memorizing noise instead of learning the trend). Lasso can even switch off useless features entirely.
Polynomial Regression & Splines — fit curves instead of lines for non-linear data. But more flexibility isn't always better:

Choosing complexity: test error vs. polynomial degree. Too simple underfits; too complex overfits — the sweet spot is in between.

📂 Regression/

🏷️ Classification — predicting a category

💡 In plain English: Classification sorts inputs into buckets. Logistic regression — despite the name — is a classifier: it outputs the probability that something belongs to a class (e.g. "85% likely diabetic").

Before modeling, explore the data — here a correlation heatmap of the Pima diabetes features shows which ones move together.

Logistic Regression — probabilistic linear classifier (on the diabetes dataset).
Decision Tree — a flowchart of yes/no questions that splits data into classes.
Bagging — train many trees on random subsets and average them to reduce variance (this is the idea behind Random Forests).
AdaBoost — train models in sequence, each one focusing on the mistakes of the last (boosting).

💡 Why ensembles (Bagging / Boosting)? One model can be wrong; a committee of models that vote is usually more accurate and stable.

📂 Classification/

📐 SVM, PCA, K-Means & Neural Networks

💡 Support Vector Machine (SVM): finds the boundary that separates two classes with the widest possible margin — the cleanest dividing line.

💡 PCA (Principal Component Analysis): a way to compress data by keeping only its most important directions of variation. Applied to face images, the "principal components" look like ghostly template faces (eigenfaces) that can be combined to reconstruct any face.

Left: an original face. Right: a PCA component ("eigenface") — a reusable building block PCA learns from the dataset.

💡 K-Means (clustering): an unsupervised method that groups data into K clusters by repeatedly assigning points to the nearest cluster center and updating the centers.

💡 Neural Networks: layers of simple units that together learn complex patterns — the foundation of deep learning.

📂 Classification/SVM - PCA - KMeans - Neural Netwroks.ipynb

🎯 LDA & QDA

💡 In plain English: LDA and QDA classify by modeling what each class's data looks like (its distribution) and asking "which class most likely produced this point?" LDA assumes a straight boundary between classes; QDA allows a curved one.

📂 LDA and QDA/

💬 NLP — Naive Bayes text classification

💡 In plain English: To classify text (e.g. positive vs. negative review), Naive Bayes counts how typical each word is for each class and multiplies the evidence together. It's "naive" because it pretends words are independent — yet it works remarkably well. Often paired with TF-IDF, which weights words by how informative they are.

📂 NLP/

🧮 Loss Functions — how a model measures its mistakes

A model improves by minimizing a loss (a score of how wrong it is). Which loss you use depends on the task:

Loss	Used for	Intuition
MSE (Mean Squared Error)	Regression	Penalizes big errors quadratically
Cross-Entropy (binary / categorical)	Classification	Punishes confident wrong answers
Hinge Loss	SVMs	Rewards a wide separating margin
Logistic Loss	Logistic Regression	Probabilistic version of classification error

💡 Good to know: most loss functions come from Maximum Likelihood Estimation (MLE) — e.g. Cross-Entropy arises from the Bernoulli distribution, and MSE from the Normal (Gaussian) distribution.

📝 Handy notes

Logistic Regression is classification, not regression — the name is historical.
The three classical linear regressions are Linear, Lasso, and Ridge.
Regularization (Lasso/Ridge) is the main tool against overfitting.

🔗 Useful Links

What is ML, in simple words (English)
Papers With Code — methods catalog
ML in simple words (Persian)
Dr. Sharifi Zarchi & Mr. Azarkhalili's ML course (Sharif University)

🚀 Getting Started

git clone https://github.com/mjmaher987/Machine-Learning.git
cd Machine-Learning

pip install numpy pandas scikit-learn matplotlib seaborn

# open any notebook, e.g.:
jupyter notebook "Regression/Linear Regression.ipynb"

If you're new, follow the folders in order: Regression → Classification → SVM/PCA/K-Means → LDA/QDA → NLP.

🗺️ Roadmap

Upload lecture notes & useful slides
Add curated courses (videos) with assignments and solutions
Grow into a question bank for teaching assistants

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
Classification		Classification
LDA and QDA		LDA and QDA
NLP		NLP
Regression		Regression
assets		assets
ISLRv2_website.pdf		ISLRv2_website.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Machine Learning — A Hands-On, Beginner-Friendly Collection

🧭 The big picture

📈 Regression — predicting a number

🏷️ Classification — predicting a category

📐 SVM, PCA, K-Means & Neural Networks

🎯 LDA & QDA

💬 NLP — Naive Bayes text classification

🧮 Loss Functions — how a model measures its mistakes

📝 Handy notes

🔗 Useful Links

🚀 Getting Started

🗺️ Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 Machine Learning — A Hands-On, Beginner-Friendly Collection

🧭 The big picture

📈 Regression — predicting a number

🏷️ Classification — predicting a category

📐 SVM, PCA, K-Means & Neural Networks

🎯 LDA & QDA

💬 NLP — Naive Bayes text classification

🧮 Loss Functions — how a model measures its mistakes

📝 Handy notes

🔗 Useful Links

🚀 Getting Started

🗺️ Roadmap

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages