Changing AI math could reduce the hardware burden

galaxyLogic1 pts0 comments

Changing AI math could reduce the hardware burden, researchers show

Jump to main content

Search

REG AD

AI and ML

Changing AI math could reduce the hardware burden, researchers show

SEMQ promises an abstraction layer for separating semantics from embeddings

Thomas Claburn

Thomas<br>Claburn

Senior reporter

Published<br>tue 30 Jun 2026 // 21:07 UTC

Sophisticated AI models tend to require a lot of memory and take up a lot of storage space. One of the ways to reduce that footprint involves a process called quantization, which changes how model weights are represented and stored. But quantization has its drawbacks.<br>Andrés Mac Allister, CEO and founder of The SEMQ Group, believes there's another way to make machine learning more efficient and less resource intensive. Instead of compressing model weights (specifically embeddings), he contends you can separate the semantics (the meaning) from how that meaning is represented.<br>Model weights, including embeddings (which map tokens to vectors), are the numbers in a machine learning model that determine how strongly one piece of information relates to another. Taken all together, they reflect learned behavior.

REG AD

These parameters are commonly represented in Full-Precision (FP32), which requires 4 bytes per parameter. A 7B parameter model at FP32 would need about 28 GB of disk space and memory.

REG AD

To save space, the model might be quantized at FP16/BF16, which requires 2 bytes per parameter. The resulting model would need about 14 GB of disk space and memory. And there are smaller quantization options like FP8, INT8/Q8, Q6, Q5, Q4, Q3, and Q2, each of which reduces the storage and memory footprint while also reducing precision – the answers get worse.<br>SEMQ stands for Symbolic Embedding Multi-Quantization. As described in a paper published earlier this year, SEMQ "replaces raw vectors with fixed-dimensional symbolic structures that preserve relational properties, such as relative similarity ordering and neighborhood structure, while decoupling representation from metrics, indexing, and execution semantics."<br>Essentially, Mac Allister has devised a way to construct a semantic abstraction layer that decouples the meaning captured in embeddings – vectors representing data – from the way that data is represented.<br>The operative idea is that semantic relationships depend primarily on the relative orientation of embedding vectors, so the absolute magnitude of those vectors becomes less important to preserve. That's less data to store.<br>The potential impact to businesses running AI workloads depends on the portion of infrastructure costs attributable to semantic state.<br>"An embedding is usually represented as a long vector of floating-point numbers," Mac Allister explained in an email to The Register. "In conventional embedding systems, semantic state is typically stored as a sequence of high-precision numerical coordinates. Those coordinates jointly encode both magnitude and direction in the embedding space.

MORE CONTEXT

Huntress CEO says threat hunter used 'poor judgment' in alerting ransomware crim about law enforcement probe

Meta's non-surgical mind reading machine improves on prior projects, but still isn't great

AI agents: Cause of database sprawl. And also the proposed solution

Microsoft previews Linux containers that run in Windows

"Our original question was whether a substantial part of the useful semantic information could instead be represented through the structural relationship among components, how they move relative to one another, which regions they occupy and what directional configuration they form in the overall space."<br>To this end, SEMQ aims to represent relative geometry rather than an enumeration of independent floating-point magnitudes.

REG AD

"That matters because semantic systems generally care about relationships, similarity, neighborhood, continuity, retrieval behavior, change over time, rather than only about preserving each raw numeric value in isolation,' said Mac Allister. "The result is a portable representation of semantic state that can be reproduced, audited, compared and transferred across processes."<br>According to Mac Allister, initial validation tests that focused on converting the embedding-based semantic state into a deterministic .semq representation, restoring it, and evaluating the stability of retrieval and classification operations have shown good results.<br>"For example, in one benchmark using the Banking77 dataset from MTEB and the all-MiniLM-L6-v2 embedding model, the FP32 baseline achieved 92.26 percent accuracy. SEMQ achieved 92.27 percent effectively matching the FP32 baseline within 0.03 percentage points."<br>SEMQ thus did substantially better than 4-bit quantization, which registered 56.05 percent accuracy, 36.22 percentage points less than FP32.<br>"These are not claims that conventional quantization is universally ineffective but they show that, in this particular semantic classification setting, preserving the...

semantic semq model embedding space quantization

Related Articles