Sreechandh — Software Engineer

SREECHANDH

Software Engineer at Amazon Web Services

FocusML Systems · CUDA · Full-stack · APIs

BasedArlington, VA

StatusAlways open to connect

Experience

May 2026 — Present

Amazon Web Services

Software Development Engineer

Building and scaling cloud infrastructure.

Full-time

May 2025 — Aug 2025

Amazon Web Services

Software Development Engineer Intern

Shipped production features end-to-end on an AWS team.

Intern

Education

Arizona State University

B.S. Computer Science

Barrett Honors College4.0 · Summa Cum LaudeMOEUR Award

2022 – 2026

Projects

001

SystemsML

Inference Engine

From-scratch LLM serving engine with PagedAttention, continuous batching, and a custom CUDA kernel. 21.2x throughput over HuggingFace at 64 concurrent requests.

PythonCUDAPyBind11PyTorchModal

Demo ↗

002

Quant

MonteCUDA

CUDA C++ derivatives pricing engine with Heston stochastic vol, four variance reduction techniques (up to 10,918x), and all five Greeks. 1611x CPU speedup, validated against QuantLib.

CUDAC++Pythonpybind11QuantLib

Demo ↗

003

Full-StackML

Echoes

Real-time multimodal medical interpreter that transcribes 40+ languages, detects visible patient distress via camera, and flags life-threatening symptoms for triage.

Next.jsPythonFastAPIGemma 4ModalvLLM

Demo ↗

004

MLTools

DriftScope

Python library for detecting semantic drift in LLM outputs: wraps model calls, embeds responses locally, and alerts when outputs drift outside a calibrated baseline.

PythonSQLiteEmbeddingsLLMObservability

PyPI ↗

005

Full-Stack

WikiMind

Knowledge base builder that ingests PDFs, Markdown, and chat exports and compiles them into a structured, cross-linked wiki using an LLM.

Node.jsNext.jsSupabaseLLMVercel

Demo ↗

006

Tools

promptpricer

CLI tool that estimates token count and cost across all major LLMs: no API keys, live pricing from LiteLLM, cached for 24h.

PythonLiteLLMCLIPyPI

PyPI ↗

Skills

Languages

PythonTypeScriptCUDA C++SQLJava

Frameworks

Next.jsReactFastAPIPyTorchNode.js

Cloud

AWSModalVercelSupabase

Tools

PostgreSQLSQLiteGitDocker

Writing

Inference Engine
Building a Production LLM Inference Engine from Scratch
A deep-dive into building vLLM's core ideas from scratch: PagedAttention, continuous batching, and a custom CUDA kernel. 21.2x throughput over HuggingFace at 64 concurrent requests.
PagedAttentionContinuous BatchingCustom CUDA Kernels
Read ↗
MonteCUDA
Building a GPU Monte Carlo Options Pricer from Scratch
A deep-dive into building a GPU derivatives pricing engine from scratch: custom CUDA kernels, Philox RNG, warp-shuffle reductions, and four variance reduction techniques. 1611x CPU speedup at 10M paths, validated against QuantLib.
Variance ReductionHeston Stochastic VolCustom CUDA Kernels
Read ↗

Contact

Feel free to reach out.

→github.com/Sreechandh22 →linkedin.com/in/sreechandhdevireddy