Important Notice

Certain specific implementations regarding the interaction between Google Chrome (via Selenium) in the this project were developed with AI assistance. All other aspects, including the core logic, finetuning data generation process, and function calling management, were created by me, Alex.

BrowserLAM — LLM-Powered Browser Agent

An automation layer that navigates Chrome based on natural language. Finetuned for better web browsing.

How BrowserLAM Works

BrowserLAM System Flow

BrowserLAM uses a loop: starting with Chrome setup, going to URLs, getting the page state, sending it to an LLM, processing responses (tool calls or messages), and executing actions until the user quits.

Finetuning Details

BrowserLAM is finetuned using a SL where a human simulates the LLM, making decisions based on the same state the model would see. The user doesn't see the Chrome window (This prevents hallucination since in some cases, looking at the chrome window could give you more info on what's happening. Ex: a native popup that doesn't show up in the screenshots, since it's an overlay)

This makes sure the LLM is trained on realistic, high-quality examples of browser automation, matching the deployment environment.

What are critics saying about the latest Marvel movie?

Plan a 1-day trip in Tokyo with food, sightseeing, and transportation info

Back to Projects