SREECHANDH
Software Engineer at Amazon Web Services
FocusML Systems · CUDA · Full-stack · APIs
BasedArlington, VA
StatusAlways open to connect
01
Experience
May 2026 — Present
Amazon Web Services
Software Development Engineer
Building and scaling cloud infrastructure.
Full-time
May 2025 — Aug 2025
Amazon Web Services
Software Development Engineer Intern
Shipped production features end-to-end on an AWS team.
Intern
02
Education
Arizona State University
B.S. Computer Science
Barrett Honors College
2022 – 2026
03
Projects
001
SystemsML
Inference Engine
LLM inference engine built from scratch implementing the core ideas behind vLLM: PagedAttention for on-demand KV cache block allocation, continuous batching to keep the GPU saturated, and a custom CUDA kernel (pybind11-bound) for non-contiguous block table access. Result: 21.2x throughput over HuggingFace at 64 concurrent requests (1340 tok/sec vs 63 tok/sec), GPU memory flat at ~3 GB regardless of load.
PythonCUDAPyBind11PyTorchModal
002
Full-StackML
Echoes
Real-time multimodal medical interpreter for clinical settings. Patients speak in any of 40+ languages — Echoes transcribes, translates, watches for visible distress via camera, flags life-threatening symptoms, and suggests triage follow-ups.
Next.jsPythonFastAPIGemma 4ModalvLLM
003
Full-Stack
WikiMind
Personal knowledge base builder — upload PDFs, Markdown, plain text, or a ChatGPT/Claude export ZIP and an LLM incrementally compiles a structured, cross-linked wiki from them. Workspace data stored in Supabase; backend on Render; frontend on Vercel. Total hosting cost $0.
Node.jsNext.jsSupabaseLLMVercel
004
ML
DriftScope
Local-first Python library for detecting semantic drift in LLM outputs. Wraps model calls, stores prompts and responses in SQLite, embeds outputs locally, and alerts when later outputs move outside a calibrated baseline — no cloud account required.
PythonSQLiteEmbeddingsLLMObservability
04
Skills
Languages
PythonTypeScriptCUDA C++SQLJava
Frameworks
Next.jsReactFastAPIPyTorchNode.js
Cloud
AWSModalVercelSupabase
Tools
PostgreSQLSQLiteGitDocker
05
Writing
  • Inference Engine
    Building a Production LLM Inference Engine from Scratch
    A deep-dive into building vLLM's core ideas from scratch: PagedAttention, continuous batching, and a custom CUDA kernel. 21.2x throughput over HuggingFace at 64 concurrent requests.
    PagedAttentionContinuous BatchingCustom CUDA Kernels
    Read ↗
06
Contact

Feel free to reach out.

github.com/Sreechandh22linkedin.com/in/sreechandhdevireddy