How to Make xt850 Match xt 850

Author: Sergey Nikolaev Published: May 04, 2026 - 7 Min read

TL;DR Since version 23.0.0, Manticore can make searches like xt850 match xt 850 using bigram_delimiter together with digit-aware bigram_index modes. This solves a common tokenization mismatch in product search, where users remove spaces from model names but the source data stores them as separate tokens. Assumptions and verification This article assumes: RT tables created with SQL examples exactly as shown default tokenization unless the example explicitly changes a setting ASCII digits in model names, because second_numeric and second_has_digit are digit-aware modes built around 0-9 All SQL examples and expected outputs in this article were verified against a real Manticore 23.0.0 instance before publishing, using fresh tables created from scratch for each scenario. The broader search problem Imagine a catalog containing: xt 850 action camera iphone 5se battery case canon eos 80d body thinkpad x1 carbon Now imagine users searching for: xt850 iphone5se eos80d thinkpadx1 From the user's point of view, these should obviously match. From the engine's point of view, they often do not, because the indexed text is tokenized as separate terms. Search systems usually attack that mismatch in one of four ways: index prefixes or infixes add custom normalization rules duplicate content into alternate normalized fields index adjacent token pairs and optionally store glued variants too Manticore's newer bigram functionality is a structured way to do the fourth option without awkward field duplication. Baseline: why xt850 fails by default Here is the problem in its simplest form: DROP TABLE IF EXISTS bi_default_demo;

CREATE TABLE bi_default_demo(title text);

INSERT INTO bi_default_demo VALUES (1,'xt 850 action camera');

SELECT id, title FROM bi_default_demo WHERE MATCH('xt850');

Expected result: Empty set

Why does this fail? Because the document is indexed as two separate tokens, xt and 850, while the query is a single token, xt850. By default, Manticore does not assume that: xt850 should be split into xt + 850 or xt + 850 should also be searchable as xt850 So this is not really a typo-tolerance problem or a phrase problem. It is a tokenization mismatch: the index sees two tokens, while the query provides one. That is the gap the newer bigram settings are designed to close. They let Manticore index selected adjacent token pairs in a form that can also match glued queries. Why bigrams help here bigram_index can help with both phrase acceleration and model-name matching, and in this article we focus on the xt 850 vs xt850 problem. The key idea is simple: detect adjacent token pairs that look like model names store those pairs in a glued form too let queries such as xt850, iphone5se, or thinkpadx1 hit the spaced text That is where bigram_delimiter matters. A note about bigram_delimiter bigram_index decides which adjacent pairs are eligible. bigram_delimiter decides how eligible bigrams are stored: true: internal delimited token only none: glued token only, such as galaxy24 both: both forms The practical difference is easiest to understand from the query side: with true, Manticore keeps the internal bigram form used for phrase optimization, but it does not keep the glued user-facing form, so a query like xt850 will not match xt 850 with none, Manticore keeps only the glued form, so xt850 can match xt 850, but you are leaning entirely on the glued representation for those pairs with both, Manticore keeps both the internal bigram representation and the glued form, so xt850 can match xt 850 without giving up ordinary phrase behavior For this use case, both is usually the safer default because it covers the user-visible problem directly while keeping behavior less surprising for normal phrase queries and mixed workloads. Mode 1: second_numeric bigram_index = second_numeric bigram_delimiter = both

This mode is aimed at model names where the second token is purely numeric. That is common in product catalogs: xt 850 galaxy 24 playstation 5 pixel 8 The idea is simple: users often search these as glued terms such as xt850, galaxy24, or playstation5, even though the source text stores them with a space. second_numeric stores the pair only when the second token is ASCII digits only. Use it when: you have product generations and numbered models users often remove spaces in search the second token is usually just digits Example DROP TABLE IF EXISTS bi_second_numeric_demo;

CREATE TABLE bi_second_numeric_demo(title text) bigram_index='second_numeric' bigram_delimiter='both';

INSERT INTO bi_second_numeric_demo VALUES (1,'xt 850 action camera'), (2,'galaxy 24 ultra'), (3,'playstation 5 slim'), (4,'iphone 5se case'), (5,'canon eos 80d body'), (6,'thinkpad x1 carbon');

Then test the queries one by one: SELECT id, title FROM bi_second_numeric_demo WHERE MATCH('xt850');

+------+----------------------+ | id |...

How to Make xt850 Match xt 850

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits