SREECHANDH
Software Engineer at Amazon Web Services
FocusML Systems · CUDA · Full-stack · APIs
BasedArlington, VA
StatusAlways open to connect
01
Experience
May 2026 — Present
Amazon Web Services
Software Development Engineer
Building and scaling cloud infrastructure.
May 2025 — Aug 2025
Amazon Web Services
Software Development Engineer Intern
Shipped production features end-to-end on an AWS team.
02
Education
Arizona State University
B.S. Computer Science
Barrett Honors College03
Projects
001
SystemsML
Inference Engine
LLM inference engine built from scratch implementing the core ideas behind vLLM: PagedAttention for on-demand KV cache block allocation, continuous batching to keep the GPU saturated, and a custom CUDA kernel (pybind11-bound) for non-contiguous block table access. Result: 21.2x throughput over HuggingFace at 64 concurrent requests (1340 tok/sec vs 63 tok/sec), GPU memory flat at ~3 GB regardless of load.
PythonCUDAPyBind11PyTorchModal
002
Full-StackML
Echoes
Real-time multimodal medical interpreter for clinical settings. Patients speak in any of 40+ languages — Echoes transcribes, translates, watches for visible distress via camera, flags life-threatening symptoms, and suggests triage follow-ups.
Next.jsPythonFastAPIGemma 4ModalvLLM
003
Full-Stack
WikiMind
Personal knowledge base builder — upload PDFs, Markdown, plain text, or a ChatGPT/Claude export ZIP and an LLM incrementally compiles a structured, cross-linked wiki from them. Workspace data stored in Supabase; backend on Render; frontend on Vercel. Total hosting cost $0.
Node.jsNext.jsSupabaseLLMVercel
004
ML
DriftScope
Local-first Python library for detecting semantic drift in LLM outputs. Wraps model calls, stores prompts and responses in SQLite, embeds outputs locally, and alerts when later outputs move outside a calibrated baseline — no cloud account required.
PythonSQLiteEmbeddingsLLMObservability
04
Skills
Languages
PythonTypeScriptCUDA C++SQLJava
Frameworks
Next.jsReactFastAPIPyTorchNode.js
Cloud
AWSModalVercelSupabase
Tools
PostgreSQLSQLiteGitDocker
05
Writing
- Read ↗Inference EngineBuilding a Production LLM Inference Engine from ScratchA deep-dive into building vLLM's core ideas from scratch: PagedAttention, continuous batching, and a custom CUDA kernel. 21.2x throughput over HuggingFace at 64 concurrent requests.PagedAttentionContinuous BatchingCustom CUDA Kernels
06
Contact
Feel free to reach out.