Skip to content
View shaheennabi's full-sized avatar

Block or report shaheennabi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
shaheennabi/README.md

Thanks for tuning here👋






Who I am

╔════════════════════╗ ║ research -- thinking, reasoning models ║ ╚════════════════════╝


I study how large language models perform multi-step reasoning and how training and post-training methods can improve their reliability, efficiency, and scalability.

My work focuses on the post-training stack for LLMs — supervised fine-tuning (SFT), preference optimization, reinforcement learning methods such as RLVR, and inference-time compute strategies that improve reasoning without requiring larger models.

I’m also interested in the interpretability of reasoning models: understanding the internal mechanisms that support multi-step reasoning and diagnosing failures such as shortcut reasoning, reward hacking, and unfaithful chain-of-thought.

Currently building and open-sourcing implementations of reasoning-focused training pipelines and contributing to LLM infrastructure and post-training frameworks.


* I love SpaceX rockets *

Pinned Loading

  1. Olmo3-from-scratch Olmo3-from-scratch Public

    “A clean, from-scratch implementation of the OLMo architecture with KV caching, RoPE, and an efficient autoregressive inference pipeline. Designed as a minimal yet extensible foundation for post-tr…

    Jupyter Notebook 1

  2. Production-Ready-Instruction-Finetuning-of-Meta-Llama-3.2-3B-Instruct-Project Production-Ready-Instruction-Finetuning-of-Meta-Llama-3.2-3B-Instruct-Project Public

    Instruction Fine-Tuning of Meta Llama 3.2-3B Instruct on Kannada Conversations. Tailoring the model to follow specific instructions in Kannada, enhancing its ability to generate relevant, context-a…

    Jupyter Notebook 24 6

  3. Proximal-Policy-Optimization-PPO Proximal-Policy-Optimization-PPO Public

    Modular Implementation of Proximal Policy Optimization (PPO) is a policy gradient reinforcement learning algorithm introduced by OpenAI in 2017. It's designed to be a simpler, more stable, and more…

    Python 1

  4. Denoising-Diffusion-Probabilistic-Models Denoising-Diffusion-Probabilistic-Models Public

    Reference implementation of Denoising Diffusion Probabilistic Model -- forward and reverse diffusion (for reference).

    Python

  5. Group-Query-Attention Group-Query-Attention Public

    This repository contains an implementation of Group Query Attention (GQA), an efficient variant of multi-head attention used in modern transformer models like LLaMA.

    Python

  6. ROPE--Rotary-Positional-Embeddings ROPE--Rotary-Positional-Embeddings Public

    A clean, efficient implementation of Rotary Positional Embeddings (RoPE) for transformers, with support for advanced variants like YARN scaling.

    Python