Projects

Selected work

Engineering projects from internship work, founding-team builds, and personal experiments. Each card opens with the recruiter-layer claim; deeper writeups and source live on the linked pages.

ZeroFalse

Multi-stage LLM pipeline that reduces false positives in static analysis.

Best-in-class F1 on both synthetic and real-world CodeQL alerts.

Takes raw CodeQL alerts and runs them through contextual reasoning + structured evidence validation to filter false positives. Evaluated 10 LLMs across 6 model families on two benchmarks (synthetic + real-world Java CVE-grounded alerts).

  • LLMs
  • CodeQL
  • Python
  • Static Analysis
  • Multi-Stage Prompting

Fabric — Agentic IDE (Farpoint)

LLM-powered agentic IDE. I own the multi-agent DAG orchestration, subagent system, and context-management layers.

Authored the empirical study behind Fabric’s externally-published March-2026 benchmark report — 99% of frontier accuracy at 18% of frontier cost on Aider Polyglot (225+ exercises, 6 languages).

Production agentic IDE in the Cursor product space. Shipped: a six-tool subagent surface (DelegateTask/SendMessage/WaitForTask/etc.), a TDD-style RED→GREEN DAG orchestrator with Mission Control dashboard, chain-of-density + KV-cache-aware summarization, the prepare→permission→execute tool lifecycle, SWE-Bench evaluation, and an MCP server exposing the test-and-break loop to AI agents.

  • TypeScript
  • Electron
  • React
  • LLM Agents
  • MCP
  • SWE-Bench
  • Docker

Pabla — Crypto Social-Trading Engine

Real-time copy-trading engine for crypto markets. Acquired by Nobitex (Middle East’s largest crypto exchange).

Iran’s leading crypto social-trading platform — ~40k users in 18 months, then acquired.

Architected the trading engine: smart order routing across 5+ exchanges (Binance, KuCoin, regional), best-execution price aggregation, per-exchange adapter pattern, async Python + Celery, copy-replication idempotency with slippage controls, circuit breakers, ~99.5% uptime SLO. Shipped MVP in ~2 months; platform reached ~40k users in 18 months.

  • Python
  • Django
  • PostgreSQL
  • Celery
  • Redis
  • asyncio
  • WebSocket
  • Docker
  • Real-time Systems

SnappFood — ETA, Churn, Fraud Models (10M+ users)

Production ML on Iran’s largest food-delivery platform: 27% better ETA, 13% lower churn, 10% CSAT lift.

27% ETA accuracy improvement, 13% churn reduction, 10% CSAT lift — measured.

Customer Experience team — built the Octopus BI layer (department-specific KPI dashboards), adapted Uber’s DeepETA to motorbike delivery for 27% ETA-accuracy improvement and 24% fewer delivery delays, shipped a churn-prediction pipeline (RFM features + logistic regression on 3M+ users) that fed reactivation campaigns dropping monthly churn by 13%, and a vendor-fraud detection system that lifted CSAT by 10%.

  • Python
  • PyTorch
  • Keras
  • scikit-learn
  • SQL
  • Power BI
  • Pandas

Clarion — Voice-to-Prompt Desktop Agent

Tauri/Rust macOS menu-bar agent: hotkey → Whisper → Haiku rewrite → paste. Built for bilingual developers.

Three shipped components: desktop agent + LoRA fine-tune + MCP server.

Personal project. Global-hotkey audio capture, dual-path Whisper (OpenAI API + local whisper.cpp), Claude-Haiku prompt structuring with project context, auto-paste via osascript. Also shipped a companion Whisper-Large-V3 LoRA fine-tuned on a bilingual Persian-English technical-speech corpus, and an MCP-server wrapper exposing the pipeline to Claude Code / Cursor.

  • Rust
  • Tauri
  • Svelte
  • Whisper
  • LoRA
  • MCP
  • Anthropic SDK

Research infrastructure & experiments

cv-and-jobs — Multi-Agent Job-Application Harness

A Claude-Code-orchestrated multi-subagent system that composes role-targeted CVs from a fact-oriented YAML KB.

Five specialized subagents, multi-model CV critique, no-fabrication audit trail.

Personal infrastructure. Five subagents with strict ownership boundaries (kb-curator, jd-analyzer, cv-composer, cv-reviewer, application-tracker); LaTeX-direct rendering through Jinja2 templates; multi-model parallel CV review via GPT-5.5 + Gemini 3.1 Pro on OpenRouter with Pydantic-validated structured outputs; an /ask command that runs Claude + GPT + Gemini in parallel and reconciles their answers. This site itself was scaffolded with help from that harness.

  • Claude Code
  • Claude Code SDK
  • Python
  • OpenRouter
  • Pydantic
  • LaTeX

CVE-Bench — Reproducible CVE Exploitation & Patching

LangGraph-orchestrated agentic pipeline that reproduces and patches CVEs end-to-end. 100+ curated CVEs.

100+ CVEs reproduced with full vulnerable/patched docker pairs and exploit verification.

SFU lab project. LangGraph state machine drives the full exploit-and-patch lifecycle per CVE: parallel PoC-analysis + advisory-search → dockerized vulnerable + patched build → automated exploit-validation loop that verifies PoC succeeds on vulnerable image and fails on patched image. Curated dataset of 100+ documented CVEs with structured exploitation metadata.

  • LangGraph
  • LangChain
  • Docker
  • Python
  • Claude Code SDK

Security-Research RAG

Retrieval-augmented QA over ~350 papers from FSE, USENIX Security, IEEE S&P, NDSS, CCS, ICSE, RAID.

Ragas-evaluated RAG with chunking-strategy ablation and grounded citations.

Personal + SFU lab project. Corpus assembly (~350 PDFs across 7 venues + CVE-Bench READMEs), pgvector vector store, LangChain orchestration, Cohere embed-v3 embedding model, Claude Haiku answering LLM. Built a Ragas-based eval harness on a 50-question hand-curated test set; ran a chunking-strategy ablation (fixed-size vs recursive-character vs semantic).

  • pgvector
  • LangChain
  • Cohere Embeddings
  • Claude Haiku
  • Ragas
  • PostgreSQL

Nano-Transformer-Security

50M-param decoder-only transformer trained from scratch on a CVE/exploit corpus.

End-to-end pretraining loop: corpus → tokenizer → 1B tokens → eval on H100.

Personal project, nanoGPT-style. Assembled the security corpus (~200k NVD CVE descriptions + CVE-Bench exploit scripts + sampled public advisories), trained a BPE tokenizer (~32k vocab), implemented and ran the pretraining loop (12 layers, 768 dim, 12 heads, ~1B tokens seen) on PyTorch + Compute Canada H100. Evaluated on held-out perplexity with qualitative generation samples.

  • PyTorch
  • HuggingFace Tokenizers
  • Distributed Training
  • Mixed Precision
  • Weights & Biases

CVE-Bench LoRA Fine-tune (Qwen2.5-Coder-7B)

Domain-specific LoRA on the CVE-Bench corpus for code-vulnerability specialization.

Baseline vs LoRA comparison on docker-validated exploit-generation success rate.

SFU lab + personal project. ~80 CVE train / ~20 held-out split with stratification across vulnerability categories. LoRA fine-tune of Qwen2.5-Coder-7B using HuggingFace Transformers + PEFT + TRL + bitsandbytes 4-bit base on Compute Canada H100. Eval harness runs the fine-tuned model’s generated exploits through CVE-Bench’s docker-based verify scripts.

  • HuggingFace TRL
  • PEFT
  • QLoRA
  • bitsandbytes
  • Transformers
  • CUDA H100