Projects

Selected work

Engineering projects from internship work, founding-team builds, and personal experiments. Each card opens with the recruiter-layer claim; deeper writeups and source live on the linked pages.

Featured

ZeroFalse

Multi-stage LLM pipeline that reduces false positives in static analysis.

Best-in-class F1 on both synthetic and real-world CodeQL alerts.

Takes raw CodeQL alerts and runs them through contextual reasoning + structured evidence validation to filter false positives. Evaluated 10 LLMs across 6 model families on two benchmarks (synthetic + real-world Java CVE-grounded alerts).

LLMs
CodeQL
Python
Static Analysis
Multi-Stage Prompting

Fabric — Agentic IDE (Farpoint)

LLM-powered agentic IDE. I own the multi-agent DAG orchestration, subagent system, and context-management layers.

Authored the empirical study behind Fabric’s externally-published March-2026 benchmark report — 99% of frontier accuracy at 18% of frontier cost on Aider Polyglot (225+ exercises, 6 languages).

Production agentic IDE in the Cursor product space. Shipped: a six-tool subagent surface (DelegateTask/SendMessage/WaitForTask/etc.), a TDD-style RED→GREEN DAG orchestrator with Mission Control dashboard, chain-of-density + KV-cache-aware summarization, the prepare→permission→execute tool lifecycle, SWE-Bench evaluation, and an MCP server exposing the test-and-break loop to AI agents.

TypeScript
Electron
React
LLM Agents
MCP
SWE-Bench
Docker

Pabla — Crypto Social-Trading Engine

Real-time copy-trading engine for crypto markets. Acquired by Nobitex (Middle East’s largest crypto exchange).

Iran’s leading crypto social-trading platform — ~40k users in 18 months, then acquired.

Architected the trading engine: smart order routing across 5+ exchanges (Binance, KuCoin, regional), best-execution price aggregation, per-exchange adapter pattern, async Python + Celery, copy-replication idempotency with slippage controls, circuit breakers, ~99.5% uptime SLO. Shipped MVP in ~2 months; platform reached ~40k users in 18 months.

Python
Django
PostgreSQL
Celery
Redis
asyncio
WebSocket
Docker
Real-time Systems

SnappFood — ETA, Churn, Fraud Models (10M+ users)

Production ML on Iran’s largest food-delivery platform: 27% better ETA, 13% lower churn, 10% CSAT lift.

27% ETA accuracy improvement, 13% churn reduction, 10% CSAT lift — measured.

Customer Experience team — built the Octopus BI layer (department-specific KPI dashboards), adapted Uber’s DeepETA to motorbike delivery for 27% ETA-accuracy improvement and 24% fewer delivery delays, shipped a churn-prediction pipeline (RFM features + logistic regression on 3M+ users) that fed reactivation campaigns dropping monthly churn by 13%, and a vendor-fraud detection system that lifted CSAT by 10%.

Python
PyTorch
Keras
scikit-learn
SQL
Power BI
Pandas

Clarion — Voice-to-Prompt Desktop Agent

Tauri/Rust macOS menu-bar agent: hotkey → Whisper → Haiku rewrite → paste. Built for bilingual developers.

Three shipped components: desktop agent + LoRA fine-tune + MCP server.

Personal project. Global-hotkey audio capture, dual-path Whisper (OpenAI API + local whisper.cpp), Claude-Haiku prompt structuring with project context, auto-paste via osascript. Also shipped a companion Whisper-Large-V3 LoRA fine-tuned on a bilingual Persian-English technical-speech corpus, and an MCP-server wrapper exposing the pipeline to Claude Code / Cursor.

Rust
Tauri
Svelte
Whisper
LoRA
MCP
Anthropic SDK

Research infrastructure & experiments

cv-and-jobs — Multi-Agent Job-Application Harness

A Claude-Code-orchestrated multi-subagent system that composes role-targeted CVs from a fact-oriented YAML KB.

Five specialized subagents, multi-model CV critique, no-fabrication audit trail.

Personal infrastructure. Five subagents with strict ownership boundaries (kb-curator, jd-analyzer, cv-composer, cv-reviewer, application-tracker); LaTeX-direct rendering through Jinja2 templates; multi-model parallel CV review via GPT-5.5 + Gemini 3.1 Pro on OpenRouter with Pydantic-validated structured outputs; an /ask command that runs Claude + GPT + Gemini in parallel and reconciles their answers. This site itself was scaffolded with help from that harness.

Claude Code
Claude Code SDK
Python
OpenRouter
Pydantic
LaTeX

CVE-Bench — Reproducible CVE Exploitation & Patching

LangGraph-orchestrated agentic pipeline that reproduces and patches CVEs end-to-end. 100+ curated CVEs.

100+ CVEs reproduced with full vulnerable/patched docker pairs and exploit verification.

SFU lab project. LangGraph state machine drives the full exploit-and-patch lifecycle per CVE: parallel PoC-analysis + advisory-search → dockerized vulnerable + patched build → automated exploit-validation loop that verifies PoC succeeds on vulnerable image and fails on patched image. Curated dataset of 100+ documented CVEs with structured exploitation metadata.

LangGraph
LangChain
Docker
Python
Claude Code SDK

Security-Research RAG

Retrieval-augmented QA over ~350 papers from FSE, USENIX Security, IEEE S&P, NDSS, CCS, ICSE, RAID.

Ragas-evaluated RAG with chunking-strategy ablation and grounded citations.

Personal + SFU lab project. Corpus assembly (~350 PDFs across 7 venues + CVE-Bench READMEs), pgvector vector store, LangChain orchestration, Cohere embed-v3 embedding model, Claude Haiku answering LLM. Built a Ragas-based eval harness on a 50-question hand-curated test set; ran a chunking-strategy ablation (fixed-size vs recursive-character vs semantic).

pgvector
LangChain
Cohere Embeddings
Claude Haiku
Ragas
PostgreSQL

Nano-Transformer-Security

50M-param decoder-only transformer trained from scratch on a CVE/exploit corpus.

End-to-end pretraining loop: corpus → tokenizer → 1B tokens → eval on H100.

Personal project, nanoGPT-style. Assembled the security corpus (~200k NVD CVE descriptions + CVE-Bench exploit scripts + sampled public advisories), trained a BPE tokenizer (~32k vocab), implemented and ran the pretraining loop (12 layers, 768 dim, 12 heads, ~1B tokens seen) on PyTorch + Compute Canada H100. Evaluated on held-out perplexity with qualitative generation samples.

PyTorch
HuggingFace Tokenizers
Distributed Training
Mixed Precision
Weights & Biases

CVE-Bench LoRA Fine-tune (Qwen2.5-Coder-7B)

Domain-specific LoRA on the CVE-Bench corpus for code-vulnerability specialization.

Baseline vs LoRA comparison on docker-validated exploit-generation success rate.

SFU lab + personal project. ~80 CVE train / ~20 held-out split with stratification across vulnerability categories. LoRA fine-tune of Qwen2.5-Coder-7B using HuggingFace Transformers + PEFT + TRL + bitsandbytes 4-bit base on Compute Canada H100. Eval harness runs the fine-tuned model’s generated exploits through CVE-Bench’s docker-based verify scripts.

HuggingFace TRL
PEFT
QLoRA
bitsandbytes
Transformers
CUDA H100