SREECHANDH
Software Engineer at Amazon Web Services
FocusML Systems · CUDA · Full-stack · APIs
BasedArlington, VA
StatusAlways open to connect
01
Experience
May 2026 — Present
Amazon Web Services
Software Development Engineer
Building and scaling cloud infrastructure.
May 2025 — Aug 2025
Amazon Web Services
Software Development Engineer Intern
Shipped production features end-to-end on an AWS team.
02
Education
Arizona State University
B.S. Computer Science
Barrett Honors College4.0 · Summa Cum LaudeMOEUR Award
03
Projects
001
SystemsML
Inference Engine
LLM inference engine built from scratch: PagedAttention, continuous batching, and a custom pybind11-bound CUDA kernel for non-contiguous KV cache access. 21.2x throughput over HuggingFace at 64 concurrent requests, GPU memory flat at ~3 GB regardless of load.
PythonCUDAPyBind11PyTorchModal
002
Full-StackML
Echoes
Real-time multimodal medical interpreter for clinical settings. Patients speak in any of 40+ languages — Echoes transcribes, translates, watches for visible distress via camera, flags life-threatening symptoms, and suggests triage follow-ups.
Next.jsPythonFastAPIGemma 4ModalvLLM
003
Full-Stack
WikiMind
Personal knowledge base builder — upload PDFs, Markdown, plain text, or a ChatGPT/Claude export ZIP and an LLM incrementally compiles a structured, cross-linked wiki from them. Workspace data stored in Supabase; backend on Render; frontend on Vercel. Total hosting cost $0.
Node.jsNext.jsSupabaseLLMVercel
004
ML
DriftScope
Local-first Python library for detecting semantic drift in LLM outputs. Wraps model calls, stores prompts and responses in SQLite, embeds outputs locally, and alerts when later outputs move outside a calibrated baseline — no cloud account required.
PythonSQLiteEmbeddingsLLMObservability
04
Skills
Languages
PythonTypeScriptCUDA C++SQLJava
Frameworks
Next.jsReactFastAPIPyTorchNode.js
Cloud
AWSModalVercelSupabase
Tools
PostgreSQLSQLiteGitDocker
05
Writing
- Read ↗Inference EngineBuilding a Production LLM Inference Engine from ScratchA deep-dive into building vLLM's core ideas from scratch: PagedAttention, continuous batching, and a custom CUDA kernel. 21.2x throughput over HuggingFace at 64 concurrent requests.PagedAttentionContinuous BatchingCustom CUDA Kernels
06
Contact
Feel free to reach out.