SREECHANDH
Software Engineer at Amazon Web Services
FocusML Systems · CUDA · Full-stack · APIs
BasedArlington, VA
StatusAlways open to connect
01
Experience
May 2026 — Present
Amazon Web Services
Software Development Engineer
Building and scaling cloud infrastructure.
May 2025 — Aug 2025
Amazon Web Services
Software Development Engineer Intern
Shipped production features end-to-end on an AWS team.
02
Education
Arizona State University
B.S. Computer Science
Barrett Honors College4.0 · Summa Cum LaudeMOEUR Award
03
Projects
001
SystemsML
Inference Engine
From-scratch LLM serving engine with PagedAttention, continuous batching, and a custom CUDA kernel. 21.2x throughput over HuggingFace at 64 concurrent requests.
PythonCUDAPyBind11PyTorchModal
002
Quant
MonteCUDA
CUDA C++ derivatives pricing engine with Heston stochastic vol, four variance reduction techniques (up to 10,918x), and all five Greeks. 1611x CPU speedup, validated against QuantLib.
CUDAC++Pythonpybind11QuantLib
003
Full-StackML
Echoes
Real-time multimodal medical interpreter that transcribes 40+ languages, detects visible patient distress via camera, and flags life-threatening symptoms for triage.
Next.jsPythonFastAPIGemma 4ModalvLLM
004
MLTools
DriftScope
Python library for detecting semantic drift in LLM outputs: wraps model calls, embeds responses locally, and alerts when outputs drift outside a calibrated baseline.
PythonSQLiteEmbeddingsLLMObservability
005
Full-Stack
WikiMind
Knowledge base builder that ingests PDFs, Markdown, and chat exports and compiles them into a structured, cross-linked wiki using an LLM.
Node.jsNext.jsSupabaseLLMVercel
006
Tools
promptpricer
CLI tool that estimates token count and cost across all major LLMs: no API keys, live pricing from LiteLLM, cached for 24h.
PythonLiteLLMCLIPyPI
04
Skills
Languages
PythonTypeScriptCUDA C++SQLJava
Frameworks
Next.jsReactFastAPIPyTorchNode.js
Cloud
AWSModalVercelSupabase
Tools
PostgreSQLSQLiteGitDocker
05
Writing
- Read ↗Inference EngineBuilding a Production LLM Inference Engine from ScratchA deep-dive into building vLLM's core ideas from scratch: PagedAttention, continuous batching, and a custom CUDA kernel. 21.2x throughput over HuggingFace at 64 concurrent requests.PagedAttentionContinuous BatchingCustom CUDA Kernels
- Read ↗MonteCUDABuilding a GPU Monte Carlo Options Pricer from ScratchA deep-dive into building a GPU derivatives pricing engine from scratch: custom CUDA kernels, Philox RNG, warp-shuffle reductions, and four variance reduction techniques. 1611x CPU speedup at 10M paths, validated against QuantLib.Variance ReductionHeston Stochastic VolCustom CUDA Kernels
06
Contact
Feel free to reach out.