I built a local layer that kills Token Tax–Python lib+Chrome extension+Mac app

gauravji1 pts0 comments

Omna — Semantic Search & PII Masking for PolarsThe universal semantic layer between enterprise data and AI.·Try it →

// 01 — the pitchOmna

Semantic search for Polars.<br>Stop writing regex to scrub patient data.<br>Omna gives Polars semantic search and PII masking — in one line of Python.<br>Local-first. Rust-powered. Zero data egress.<br>Search your DataFrames by meaning, not strings. Mask sensitive columns before they ever reach a model.<br>Local-firstRust kernel0 network callsHIPAA · readyⓘ<br>Try it now ↓$pip install omnacopy<br>Star us on GitHub — help us hit 10k ★<br>// 2027 mission<br>We're building the universal semantic layer between enterprise data and AI. We started with Polars because that's where the fastest-growing data engineering community is. Our roadmap takes us to every format and every model.

~12ms<br>p50 semantic search<br>1M rows · M2 Macbook

network calls<br>ever, by design

100%<br>local execution<br>no vendor BAA needed

✕ Before · regex hell✓ After · with Omna<br>patients.py<br># Before — regex hell<br>import re, polars as pl<br>df = pl.read_parquet("patients.parquet")

PAIN = re.compile(r"chest|cardiac|angina|heart\s+pain", re.I)<br>hits = []<br>for row in df.iter_rows(named=True):<br>if PAIN.search(row["notes"] or ""):<br>hits.append(row)<br># ...still missing 'shortness of breath',<br># 'tightness', 'pressure'... etc. forever.<br>~30 lines · brittle · missing synonymsauto-toggles · click to lock

python · pipeline.py● live

// why this matters<br>The bridge between smart models and invisible data.

// the principle"They cannot see your data until you show it to them in exactly the right way."<br>// the analogy"Omna is to AI models what a good search engine is to websites."<br>// the wedge"Google didn't kill websites — it made them more findable, and therefore more valuable. Omna doesn't replace AI. It makes AI usable on data that was previously unreachable — and therefore drives more AI usage."<br>// the bridge"AI is smart but blind. Your data is rich but invisible to AI. Omna is the bridge."

Performance-critical similarity engine written in Rust · local-first by design.

// 02 — try it now<br>Don't watch a demo. Break it.<br>No signup. No data leaves your machine. Real Polars syntax, real output, runs entirely in your browser.

.search().mask_pii().understand()+ more<br>datasethealthcare.parquetsupport_ticketslegal_contracts+ upload .csv8 rows · parsed in your browser<br>playground.pyrun<br>1import polars as pl<br>2import omna # registers .omna namespace<br>4df = pl.read_parquet("healthcare.parquet")<br>6(df.omna<br>7 .search("heart pain", top_k=3)<br>8 .collect())

query<br>heart painanxiety symptomsdiabetes complicationsmood disorder<br>top_k3

output · DataFrame

readypolars 1.18 · omna 0.3.1arrow-rs 53.0<br>ⓘ browser demo · local · no telemetry<br>// terminal output will appear here

// compatible withno lock-in · plays with your stack<br>Polars•<br>Apache Arrow•<br>Parquet•<br>Rust•<br>PyO3•<br>DuckDB•<br>Hugging Face•<br>ONNX•<br>LanceDB•<br>Delta Lake•<br>Pandas (compat)•<br>JupyterLab•<br>Polars•<br>Apache Arrow•<br>Parquet•<br>Rust•<br>PyO3•<br>DuckDB•<br>Hugging Face•<br>ONNX•<br>LanceDB•<br>Delta Lake•<br>Pandas (compat)•<br>JupyterLab•

// 03 — capabilities<br>Search first. Mask second. Then filter or ask questions. Everything else on the way.

Every tile below is live — the same Rust kernel that runs in your notebook is rendering this page.

// .search()<br>Search by meaning, not strings.<br>→chest pain after exertion<br>0.94<br>0.87<br>0.82

// latency<br>p50 on 1M rows

live · last 24 runs12ms

// pii masking<br>df.omna.mask_pii() — one line, audit-logged.<br>nameemailnote<br>Sarah Chensarah.chen@stanford.eduPatient 0421 · MRN 88231<br>James Patelj.patel@mayo.orgPatient 0422 · MRN 88401<br>Lin Okaforlokafor@hopkins.eduPatient 0423 · MRN 88502

mode: rawPERSON · EMAIL_ADDRESS

// .understand()<br>Schemas that read themselves.<br>patient_idIDmedium<br>emailEMAILhigh<br>diagnosisSEARCHABLEnone<br>dobDATEhigh

// .filter()live<br>Filter by intent, not LIKE.<br>› df.omna.filter("kids under 12 with chest pain")try it in the playground →

// .ask()live<br>Ask in English. Get rows.<br>› df.omna.ask("flag anyone at risk of CHF")try it in the playground →

// local-first<br>Zero network calls. Ever.<br>egress packets0<br>vendor API calls0<br>data leaves machinenever

// why omna is fast<br>Built on Polars — so embeddings never leave Arrow memory.

Omna is built natively on Polars — which means it inherits Arrow's columnar memory format, SIMD vectorization, and zero-copy data structures.<br>Your embeddings never leave the Arrow memory layout. No serialization. No copying. No overhead.<br>When you call df.omna.search(), the Rust similarity kernel operates directly on the same memory Polars is already using.

// architecture note<br>"Polars uses Apache Arrow's columnar memory format with SIMD vectorization — the same memory our Rust similarity kernel operates on directly. On Pandas we'd need to copy data into NumPy arrays first. On Polars it's zero-copy end to end."<br>Ritchie Vink's original post on the Polars architecture explains it better than we can. Read it →<br>memory<br>Apache Arrow

vectorization<br>SIMD

kernel<br>Rust · zero-copy

copies

// 04 — what teams...

omna polars search data rust local

Related Articles