How to Create LLM WebSocket Chat with llama.cpp?
Learn how to setup a WebSocket server and create basic chat with an open source LLM.
Preparations
Before starting this tutorial, it might be useful for you to familiarize yourself with the other tutorials and documentation pages first:
- How to Serve LLM Completions (With llama.cpp)? - to setup llama.cpp server and configure Resonance to use it
- WebSockets - to learn how WebSocket features are implemented in Resonance
In this tutorial, we will also use Stimulus for the front-end code.
Once you have both Resonance and llama.cpp runing, we can continue.
Backend
Security
First, we must create a security gate and decide which users can use our chat. Let's say all authenticated users can do that:
app/SiteActionGate/StartWebSocketJsonRPCConnectionGate.phpphp<?php namespace App\SiteActionGate; use App\Role; use Distantmagic\Resonance\Attribute\DecidesSiteAction; use Distantmagic\Resonance\Attribute\Singleton; use Distantmagic\Resonance\AuthenticatedUser; use Distantmagic\Resonance\SingletonCollection; use Distantmagic\Resonance\SiteAction; use Distantmagic\Resonance\SiteActionGate; #[DecidesSiteAction(SiteAction::StartWebSocketJsonRPCConnection)] #[Singleton(collection: SingletonCollection::SiteActionGate)] final readonly class StartWebSocketJsonRPCConnectionGate extends SiteActionGate { public function can(?AuthenticatedUser $authenticatedUser): bool { return true === $authenticatedUser?->user->getRole()->isAtLeast(Role::User); } }
HTTP Responder
First, we need to create a route in our application that will render the HTML chat.
We are using a few attributes:
-
Can
- this is an Authorization attribute it checks if user can connect with WebSockets. If they can't - it will return 403 error page instead. That is optional, but withoutCan
attribute, all users will be able to visit the page, but they won't be able to establish a WebSocket connection anyway. -
RespondsToHttp
- registers route -
Singleton
- adds yourHttpResponder
to Dependency Injection container. We added#[WantsFeature(
attribute to enableFeature:: WebSocket) WebSocket
server.
app/HttpResponder/LlmChat.phpphp<?php namespace App\HttpResponder; use App\HttpRouteSymbol; use Distantmagic\Resonance\Attribute\Can; use Distantmagic\Resonance\Attribute\RespondsToHttp; use Distantmagic\Resonance\Attribute\Singleton; use Distantmagic\Resonance\Attribute\WantsFeature; use Distantmagic\Resonance\Feature; use Distantmagic\Resonance\HttpInterceptableInterface; use Distantmagic\Resonance\HttpResponder; use Distantmagic\Resonance\RequestMethod; use Distantmagic\Resonance\SingletonCollection; use Distantmagic\Resonance\SiteAction; use Distantmagic\Resonance\TwigTemplate; use Psr\Http\Message\ResponseInterface; use Psr\Http\Message\ServerRequestInterface; #[Can(SiteAction::StartWebSocketJsonRPCConnection)] #[RespondsToHttp( method: RequestMethod::GET, pattern: '/chat', routeSymbol: HttpRouteSymbol::LlmChat, )] #[Singleton(collection: SingletonCollection::HttpResponder)] #[WantsFeature(Feature::WebSocket)] final readonly class LlmChat extends HttpResponder { public function respond(ServerRequestInterface $request, ResponseInterface $response): HttpInterceptableInterface { return new TwigTemplate('turbo/llmchat/index.twig'); } }
WebSocket Message Validation
First, we need to register our WebSocket RPC methods. Default WebSocket implementation follows a simple RPC protocol. All messages need to be a tuple:
[method, payload, null|request id]
Method, the first field, is validated against RPCMethodInterface
, through
RPCMethodValidator
(you need to provide both):
app/RPCMethod.phpphp<?php namespace App; use Distantmagic\Resonance\EnumValuesTrait; use Distantmagic\Resonance\JsonRPCMethodInterface; enum JsonRPCMethod: string implements JsonRPCMethodInterface { use EnumValuesTrait; case LlmChatPrompt = 'llm_chat_prompt'; case LlmToken = 'llm_token'; public function getValue(): string { return $this->value; } }
app/RPCMethodValidator.phpphp<?php namespace App; use Distantmagic\Resonance\Attribute\Singleton; use Distantmagic\Resonance\Attribute\WantsFeature; use Distantmagic\Resonance\Feature; use Distantmagic\Resonance\JsonRPCMethodInterface; use Distantmagic\Resonance\JsonRPCMethodValidatorInterface; #[Singleton(provides: JsonRPCMethodValidatorInterface::class)] #[WantsFeature(Feature::WebSocket)] readonly class JsonRPCMethodValidator implements JsonRPCMethodValidatorInterface { public function cases(): array { return JsonRPCMethod::cases(); } public function castToRPCMethod(string $methodName): JsonRPCMethodInterface { return JsonRPCMethod::from($methodName); } public function values(): array { return JsonRPCMethod::values(); } }
The above means your application will support llm_chat_prompt
and
llm_chat_token
messages.
WebSocket Message Responder
Now, we need to register a WebSocket responder that will run every time
someone sends llama_chat_prompt
message to the server.
It will then use LlamaCppClient
to fetch tokens from
llama.cpp server and forward them
to the WebSocket connection:
app/WebSocketJsonRPCResponder/LlmChatPromptResponder.phpphp<?php namespace App\WebSocketJsonRPCResponder; use App\JsonRPCMethod; use Distantmagic\Resonance\Attribute\RespondsToWebSocketJsonRPC; use Distantmagic\Resonance\Attribute\Singleton; use Distantmagic\Resonance\Attribute\WantsFeature; use Distantmagic\Resonance\Constraint; use Distantmagic\Resonance\Constraint\ObjectConstraint; use Distantmagic\Resonance\Constraint\StringConstraint; use Distantmagic\Resonance\Feature; use Distantmagic\Resonance\JsonRPCNotification; use Distantmagic\Resonance\JsonRPCRequest; use Distantmagic\Resonance\JsonRPCResponse; use Distantmagic\Resonance\LlamaCppClientInterface; use Distantmagic\Resonance\LlamaCppCompletionRequest; use Distantmagic\Resonance\SingletonCollection; use Distantmagic\Resonance\WebSocketAuthResolution; use Distantmagic\Resonance\WebSocketConnection; use Distantmagic\Resonance\WebSocketJsonRPCResponder; #[RespondsToWebSocketJsonRPC(JsonRPCMethod::LlmChatPrompt)] #[Singleton(collection: SingletonCollection::WebSocketJsonRPCResponder)] #[WantsFeature(Feature::WebSocket)] final readonly class LlmChatPromptResponder extends WebSocketJsonRPCResponder { public function __construct( private LlamaCppClientInterface $llamaCppClient, ) {} public function getConstraint(): Constraint { return new ObjectConstraint( properties: [ 'prompt' => new StringConstraint(), ], ); } public function onNotification( WebSocketAuthResolution $webSocketAuthResolution, WebSocketConnection $webSocketConnection, JsonRPCNotification $rpcNotification, ): void { $request = new LlamaCppCompletionRequest($rpcNotification->payload->prompt); $completion = $this->llamaCppClient->generateCompletion($request); foreach ($completion as $token) { if ($webSocketConnection->status->isOpen()) { $webSocketConnection->push(new JsonRPCNotification( JsonRPCMethod::LlmToken, $token->content, )); } else { $completion->stop(); } } } }
Frontend
Stimulus controller starts WebSocket
connection and sends user prompts to the server. Not that WebSocket
connection uses both CSRF Protection and
jsonrpc
Protocols:
resources/ts/controller_llmchat.tstypescriptimport { Controller } from "@hotwired/stimulus"; import { stimulus } from "../stimulus"; @stimulus("llmchat") export class controller_llmchat extends Controller<HTMLElement> { public static targets = ["userInputField", "userInputForm", "chatLog"]; public static values = { csrfToken: String, }; private declare readonly chatLogTarget: HTMLElement; private declare readonly csrfTokenValue: string; private declare readonly userInputFieldTarget: HTMLTextAreaElement; private webSocket: null|WebSocket = null; public connect(): void { const servertUrl = new URL(__WEBSOCKET_URL); servertUrl.searchParams.append("csrf", this.csrfTokenValue); const webSocket = new WebSocket(servertUrl, ["jsonrpc"]); webSocket.addEventListener("close", () => { this.webSocket = null; }); webSocket.addEventListener("open", () => { this.webSocket = webSocket; }); webSocket.addEventListener("message", (evt: MessageEvent) => { if ("string" !== typeof evt.data) { return; } const parsed: unknown = JSON.parse(evt.data); this.chatLogTarget.append(parsed.result); }); } public disconnect(): void { this.webSocket?.close(); } public onFormSubmit(evt: Event): void { evt.preventDefault(); this.chatLogTarget.innerHTML = ''; this.webSocket?.send(JSON.stringify({ jsonrpc: "2.0", method: "llm_chat_prompt", params: { prompt: this.userInputFieldTarget.value, } })); this.userInputFieldTarget.value = ''; } }
We will use Asset Bundling (esbuild) to bundle front-end TypeScript code.
shell$ ./node_modules/.bin/esbuild \ --bundle \ --define:global=globalThis \ --entry-names="./[name]_$(BUILD_ID)" \ --format=esm \ --log-limit=0 \ --metafile=esbuild-meta-app.json \ --minify \ --outdir=./$(BUILD_TARGET_DIRECTORY) \ --platform=browser \ --sourcemap \ --target=es2022,safari16 \ --tree-shaking=true \ --tsconfig=tsconfig.json \ resources/ts/controller_llmchat.ts \ ;
Finally, the HTML:
app/views/llmchat.twigtwig<script defer type="module" src="{{ esbuild(request, 'controller_llmchat.ts') }}" ></script> <div data-controller="llmchat" data-llmchat-csrf-token-value="{{ csrf_token(request, 'llm_chat) }}" > <div data-llmchat-target="chatLog"></div> <form data-action="submit->llmchat#onFormSubmit"> <input autofocus data-llmchat-target="userInputField" type="text" ></input> <button type="submit">Send</button> </form> </div>
Summary
Now, you can serve LLM chat in your application. Enjoy!
Comments
If you want to leave a comment Start a discussion on GitHub or join our Community