Skip to content

mjmaher987/Machine-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 Machine Learning — A Hands-On, Beginner-Friendly Collection

banner

A tour through classical machine learning — implemented and explained from the ground up. Each notebook tackles a real dataset, and this README walks you through what each method actually does, in plain language.

  • Author: Mohammad Javad Maheronnaghsh

📘 New to ML? Every method below starts with a "💡 In plain English" box — no formulas, no jargon — so you can build intuition before touching the code.


🧭 The big picture

Machine learning is about learning patterns from data instead of being explicitly programmed. The two main families:

  • Supervised learning — you have labeled examples (input → correct answer) and the model learns to predict the answer for new inputs. Splits into:
    • Regression → predict a number (e.g. house price).
    • Classification → predict a category (e.g. spam / not-spam).
  • Unsupervised learning — no labels; the model finds structure on its own (e.g. grouping similar items, compressing data).
Folder Methods Type
Regression/ Linear · Lasso/Ridge · Polynomial & Splines Supervised (number)
Classification/ Logistic Regression · Decision Tree · Bagging · AdaBoost · SVM · Neural Nets Supervised (category)
Classification/ (PCA, K-Means) PCA · K-Means Unsupervised
LDA and QDA/ Linear & Quadratic Discriminant Analysis Supervised (category)
NLP/ Naive Bayes text classification Supervised (text)

📚 The classic textbook An Introduction to Statistical Learning (ISLR) is included as a reference.


📈 Regression — predicting a number

💡 In plain English: Regression draws the "best-fit" relationship between inputs and a numeric output, so you can predict that output for new inputs. Linear regression fits a straight line; if the line bends, you need a curve.


Insurance data — how charges relate to age. Regression learns the trend through points like these.

  • Linear Regression — the foundational straight-line fit.
  • Lasso & Ridge — linear regression with regularization: a penalty that keeps the model simple to avoid overfitting (memorizing noise instead of learning the trend). Lasso can even switch off useless features entirely.
  • Polynomial Regression & Splines — fit curves instead of lines for non-linear data. But more flexibility isn't always better:


Choosing complexity: test error vs. polynomial degree. Too simple underfits; too complex overfits — the sweet spot is in between.

📂 Regression/


🏷️ Classification — predicting a category

💡 In plain English: Classification sorts inputs into buckets. Logistic regression — despite the name — is a classifier: it outputs the probability that something belongs to a class (e.g. "85% likely diabetic").


Before modeling, explore the data — here a correlation heatmap of the Pima diabetes features shows which ones move together.

  • Logistic Regression — probabilistic linear classifier (on the diabetes dataset).
  • Decision Tree — a flowchart of yes/no questions that splits data into classes.
  • Bagging — train many trees on random subsets and average them to reduce variance (this is the idea behind Random Forests).
  • AdaBoost — train models in sequence, each one focusing on the mistakes of the last (boosting).

💡 Why ensembles (Bagging / Boosting)? One model can be wrong; a committee of models that vote is usually more accurate and stable.

📂 Classification/


📐 SVM, PCA, K-Means & Neural Networks

💡 Support Vector Machine (SVM): finds the boundary that separates two classes with the widest possible margin — the cleanest dividing line.

💡 PCA (Principal Component Analysis): a way to compress data by keeping only its most important directions of variation. Applied to face images, the "principal components" look like ghostly template faces (eigenfaces) that can be combined to reconstruct any face.


Left: an original face. Right: a PCA component ("eigenface") — a reusable building block PCA learns from the dataset.

💡 K-Means (clustering): an unsupervised method that groups data into K clusters by repeatedly assigning points to the nearest cluster center and updating the centers.

💡 Neural Networks: layers of simple units that together learn complex patterns — the foundation of deep learning.

📂 Classification/SVM - PCA - KMeans - Neural Netwroks.ipynb


🎯 LDA & QDA

💡 In plain English: LDA and QDA classify by modeling what each class's data looks like (its distribution) and asking "which class most likely produced this point?" LDA assumes a straight boundary between classes; QDA allows a curved one.

📂 LDA and QDA/


💬 NLP — Naive Bayes text classification

💡 In plain English: To classify text (e.g. positive vs. negative review), Naive Bayes counts how typical each word is for each class and multiplies the evidence together. It's "naive" because it pretends words are independent — yet it works remarkably well. Often paired with TF-IDF, which weights words by how informative they are.

📂 NLP/


🧮 Loss Functions — how a model measures its mistakes

A model improves by minimizing a loss (a score of how wrong it is). Which loss you use depends on the task:

Loss Used for Intuition
MSE (Mean Squared Error) Regression Penalizes big errors quadratically
Cross-Entropy (binary / categorical) Classification Punishes confident wrong answers
Hinge Loss SVMs Rewards a wide separating margin
Logistic Loss Logistic Regression Probabilistic version of classification error

💡 Good to know: most loss functions come from Maximum Likelihood Estimation (MLE) — e.g. Cross-Entropy arises from the Bernoulli distribution, and MSE from the Normal (Gaussian) distribution.


📝 Handy notes

  • Logistic Regression is classification, not regression — the name is historical.
  • The three classical linear regressions are Linear, Lasso, and Ridge.
  • Regularization (Lasso/Ridge) is the main tool against overfitting.

🔗 Useful Links


🚀 Getting Started

git clone https://github.com/mjmaher987/Machine-Learning.git
cd Machine-Learning

pip install numpy pandas scikit-learn matplotlib seaborn

# open any notebook, e.g.:
jupyter notebook "Regression/Linear Regression.ipynb"

If you're new, follow the folders in order: Regression → Classification → SVM/PCA/K-Means → LDA/QDA → NLP.


🗺️ Roadmap

  • Upload lecture notes & useful slides
  • Add curated courses (videos) with assignments and solutions
  • Grow into a question bank for teaching assistants

Releases

No releases published

Packages

 
 
 

Contributors