My Projects

A more efficient inference architecture

Leverages perplexity to switch to larger LLMs when a small 'draft' model is uncertain. If MoE and SpecDec had a baby. Works best with two models, one of them being distilled from the other.

Python Llama.cpp Perplexity LLMs

Quinn — AI Email & Workflow Assistant

Built my own LLM assistant that replies to emails and messages the user via Telegram. Integrated memory, prompt optimization, and fallback logic for robust performance.

Quinn uses a sophisticated pipeline to understand context, generate appropriate responses, and maintain a persistent memory of interactions. The system includes safety mechanisms to prevent harmful outputs and leverages prompt engineering techniques to maximize LLM capabilities.

Python OpenAI API (gpt-4o, text-embedding-ada) HTTP Requests Google Cloud Functions Google Cloud Firestore

BrowserLAM — LLM-Powered Browser Agent

An automation layer that navigates Chrome based on natural language. Finetuned for better web browsing capabilities.

Created a mock loop where the user pretends to be the LLM, making optimal decisions, translating the user's decisions into function call examples and assistant responses for finetuning. Implemented light measures to prevent covariate shift.

Python OpenAI API (gpt-4o finetune)

BrowserLAM

Kalshi

Polymarket

Arbitrage Trading Bot (Kalshi/Polymarket)

A bot designed to identify arbitrage opportunities between Kalshi and Polymarket. It doesn't run automatically, requiring human verification due to its fast, rather than highly precise, "embedding" algorithm for matching markets.

Instead of using a true embedding model, it uses a faster, local alternative that functions similarly.

HTTP Requests Local Embedding Model