Point Your AI at Your Documents | LightOn
Let's connect at VivaTech
📍Where to find us!
PricingBlogPartnersCompany
Investors🇫🇷 French🇬🇧 English
Read the docs<br>Get started
In production at
Three endpoints.<br>Zero pipeline maintenance.<br>Parse a document. Extract any field. Retrieve with citations.<br>The same API key on Console, the same SDK on Enterprise.
/parse<br>Documents in. Structure out.<br>/extract<br>Pull the fields you care about.<br>/search<br>Grounded retrieval with citations.
LightOnOCR-2.<br>State-of-the-art parsing.
Turns scans, tables, handwriting, and multi-column layouts into structured Markdown. 20+ languages natively. The parsing engine behind every retrieval workflow.
83.2
OLMOCR-BENCH (SOTA)
€0.002
Per Page
20+
Native language
Open
Weights on HuggingFace
cURL
$ curl https://api.lighton.ai/v3/parse \<br>-H 'Authorization: Bearer $LIGHTON_API_KEY' \<br>-H 'Content-Type: application/json'\<br>-d '{"document":"https://console-examples.lighton.ai/AFD-091005-062.pdf"}'
Define the schema.<br>Get JSON back.
Pull any field, entity, or key-value pair you care about. Invoice numbers, lease end dates, claim IDs, contract clauses. You define the schema; LightOn returns structured JSON.
JSON Schema
In / Out
€0.004
Per Page
Async
Via Webhooks
Cited
Per Field
cURL
$ curl https://api.lighton.ai/v3/extract \<br>-H 'Authorization: Bearer $LIGHTON_API_KEY' \<br>-H 'Content-Type: application/json' \<br>-d ''{"document":"https://console-examples.lighton.ai/AFD-091005-062.pdf","schema":""}'
Grounded retrieval<br>with citations.
One query, three signals: dense, sparse, late-interaction. The index picks the right signal, not the developer. Every result ships with the source passage that produced it. Built on LateOn and NextPlaid, our open-source ColBERT family.
Multi-vector
Dense + Sparse + Li
€0.006
Per Query
P50 Latency
ACL
At Chunk Level
cURL
$ curl https://api.lighton.ai/v3/search \<br>-H 'Authorization: Bearer $LIGHTON_API_KEY' \<br>-H 'Content-Type: application/json' \<br>-d '{"query":""}'
Try it in your browser.<br>No install needed.<br>Test every endpoint in Console. Drop a file, get the response, copy the code into your project.
Open the playground
console.lighton.ai
BUILT ON OPEN RESEARCH
Our retrieval models are in your dependency tree.
50M
HuggingFace Downloads
916K
PYPI installs per month
2,345
GitHub Stars
3,845
HuggingFace Likes
Open-source models in production
Empower developers with a production-ready Multimodal Retrieval API running on your infrastructure. Integrate secure reasoning into your apps (CRM, ERP) without managing the complex AI stack.
LateOn<br>NextPlaid<br>PyLate<br>DenseOn<br>LightOnOCR-2<br>+8 more on HF →
RAG was built for chatbots.<br>LightOn is built for agents.<br>Agents do not ask nicely. They dump raw PDFs, garbled tables, and off-domain queries into the same thread. The retrieval layer has to handle the input it gets, not the input you wish you had.
Retrieval
Hybrid retrieval
Dense, lexical, and late-interaction signals on one query. The index picks the right signal, not the developer. Built on LateOn and NextPlaid, our open-source ColBERT family.
Trust
Grounded by default
Every answer ships with the exact passage that supports it. Retrieval and reasoning are separable. Auditable by design, not retrofitted.
Infra
LLM-agnostic
Bring your own model. Open-source, commercial, or private. No lock-in on the inference layer. Your security policy dictates where inference happens.
Protocol
MCP-native
Drop LightOn into any agent that speaks Model Context Protocol. Single agent, multi-agent system, or business application integration. Same API.
Scope
Workspaces and ACLs at chunk level
Multi-agent systems need scoped corpora. Each agent gets its own workspace, its own collections, its own permissions. Access control is enforced at the chunk, not at the document. An agent never sees a single token it should not see.
Built for search
LightOn supports every modern enterprise search behavior
Chat Search
Conversational Search & Q&A
Users ask questions, not keywords. Switch instantly between retrieving a list of documents or getting a synthesized answer backed by precise, clickable citations.
Massive RAG
Massive Multimodal RAG
Analyze more than just text. The engine ingests millions of files and understands complex formats: images, technical diagrams, tables, and handwritten notes with high precision.
Tool Chaining
Agentic Reasoning Chains
Execute complex tasks, not just search. For multi-step requests (e.g., "Find info + Cross-reference HR + Generate graph"), the AI autonomously chains tools and sources to deliver a complete result.
Team Agents
Custom Specialized Agents
Create dedicated experts for every team. Empower users to build custom agents with specific prompts and restricted document scopes tailored to their role
Data Sync
Universal Data Synchronization
Connect all your knowledge silos. Seamlessly index and sync data from external sources (SharePoint, Drive, Confluence,...