Show HN: Ext-Infer – Native LLM Inference and Embeddings for PHP

eamann1 pts0 comments

Introduction - ext-infer

Keyboard shortcuts

Press ← or → to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Auto

Light

Rust

Coal

Navy

Ayu

ext-infer

Introduction

ext-infer is a PHP 8.3+ extension that loads a GGUF<br>model and runs LLM inference inside the PHP process via<br>llama.cpp. PHP-native semantic<br>search, RAG pipelines, and CLI / worker inference run without shelling<br>out to Python or hitting a remote API.

It is written in Rust on top of<br>ext-php-rs and the<br>llama-cpp-2 bindings. The public<br>PHP surface is designed to feel native: a fluent, role-aware Prompt<br>builder; a Response that splits reasoning from answer; an Embedding<br>that knows how to normalize itself and compute cosine similarity. You<br>should rarely, if ever, need to think about tokens.

Why an extension?

Three reasons local inference belongs in PHP rather than next to it:

Latency. A subprocess fork or HTTP roundtrip is at least<br>milliseconds, often tens. An in-process call is bounded only by<br>decode time.

Operational surface. No Python sidecar to package, no daemon to<br>supervise, no inference server to scale alongside FPM. The PHP<br>process is the inference server.

API ergonomics. Calling a local LLM should be as natural in PHP<br>as calling intl or pdo. The extension API is shaped to match<br>that — see Prompts and<br>Chat completions.

What’s here

This guide is split into five layers, navigable from the sidebar:

SectionWhat you’ll find

Getting StartedInstall, run hello-world, verify it loaded.<br>GuideConceptual walkthroughs of each public class. Read in order on first pass.<br>RecipesCopy-paste-ready patterns: multi-turn chat, semantic search, RAG, worker pools.<br>ReferenceComplete API listing, exceptions, environment variables, compatibility matrix.<br>AdvancedThreading model, Apple Metal, performance tuning.

Status

ext-infer is pre-release — the class surface is stable but the<br>first tagged release (v0.1.0) is still in flight. See<br>RELEASE.md<br>for the cut-a-release flow and PLAN.md<br>for what’s coming next.

Conventions in this guide

Code blocks are runnable as written, with one exception: PHP code<br>assumes the extension is loaded. Either install it system-wide or<br>prepend -d extension=… to your php command. See<br>Installation.

Model without a namespace prefix means Displace\Infer\Model;<br>same for Prompt, Response, Embedding. Real code needs the use<br>statement at the top of the file.

CLI snippets are written for a POSIX shell (bash / zsh). Adjust<br>for fish / PowerShell as needed; differences are usually only quoting.

infer inference extension press model release

Related Articles