llama.cpp
llama.cpp is an open source framework capable of running various LLM models.
It has a built-in HTTP server that supports continuous batching, parallel requests and is optimized for resouces usage.
You can use Resonance to connect with it and process LLM responses.
Usage
You can also check the tutorial: How to Serve LLM Completions (With llama.cpp)?
Configuration
All you need to do is add a configuration section that specifies the llama.cpp server location:
ini[llamacpp] host = 127.0.0.1 port = 8081
Completions
In your class, you need to use Dependency Injection to
inject LlamaCppClient
. Then, you need to use the appropriate
Prompt Templates template adequate for the model
you are serving. In the following example we will use
Mistral-Instruct template:
php<?php namespace App; use Distantmagic\Resonance\LlamaCppClientInterface; use Distantmagic\Resonance\LlamaCppCompletionRequest; use Distantmagic\Resonance\LlamaCppPromptTemplate\MistralInstructChat; #[Singleton] class LlamaCppGenerate { public function __construct(protected LlamaCppClientInterface $llamaCppClient) { } public function doSomething(): void { $template = new MistralInstructChat('How to make a cat happy?'); $request = new LlamaCppCompletionRequest($template); $completion = $this->llamaCppClient->generateCompletion($request); // each token is a chunk of text, usually few-several letters returned // from the model you are using foreach ($completion as $token) { swoole_error_log(SWOOLE_LOG_DEBUG, (string) $token); } } }
Stopping Generator
Using just the break
keyword does not stop the completion request
(llama.cpp
will keep going, even though the PHP loop is stopped). You need
to use $completion-
.
For example, stops after generating 10 tokens:
php$i = 0; foreach ($completion as $token) { if ($i > 9) { $completion->stop(); } else { // do something $i += 1; } }
Embeddings
php<?php namespace App; use Distantmagic\Resonance\LlamaCppClientInterface; use Distantmagic\Resonance\LlamaCppEmbeddingRequest; #[Singleton] class LlamaCppGenerate { public function __construct(protected LlamaCppClientInterface $llamaCppClient) { } public function doSomething(): void { $request = new LlamaCppEmbeddingRequest('How to make a cat happy?'); $response = $this->llamaCppClient->generateEmbedding($request); /** * @var array<float> */ $response->embedding; } }