LLM INQUISITOR: Evaluating how AI models handle long, realistic tasks

GitHub - AssimilatedHuman/LLM-Inquisitor: Evaluating AI behaviour under real‑world work conditions to surface issues before they become problems. LLM INQUISITOR identifies failures (drift, instability etc) by observing AI during normal tasks — a tool the industry desperately needs to stem the 85% failure rate. Includes Quick Start, Practitioner’s Guide and Methodology. · GitHub

/" data-turbo-transient="true" />

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Clear

Search syntax tips

Provide feedback

--> We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

/;ref_cta:Sign up;ref_loc:header logged out"}" Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

AssimilatedHuman

LLM-Inquisitor

Public

Notifications You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files NameNameLast commit message Last commit date Latest commit

History 4 Commits 4 Commits

LICENSE

LLM INQUISITOR 1 Page User Report Aid.pdf

LLM INQUISITOR Methodology (GitHub Edition) V1.pdf

LLM INQUISITOR Practitioners Guide V 1.pdf

LLM INQUISITOR Quick Start Guide V 1.pdf

README.md

View all files

Repository files navigation

LLM INQUISITOR — GitHub Edition The Behavioural Evaluation Standard for Real‑World AI LLM INQUISITOR is a practical, workflow‑driven methodology for evaluating how AI systems behave when they’re actually used — not when they’re demoed, benchmarked, or prompt‑engineered.

If you want to know whether an AI is stable, reliable, predictable, and safe in real work, INQUISITOR is the tool.

Why INQUISITOR Exists AI doesn’t fail in benchmarks. It fails in:

developer workflows

document editing

analysis tasks

coding sessions

customer‑facing interactions

That’s where drift, collapse, contradiction, contamination, and instability actually matter.

INQUISITOR reveals that behaviour using normal work, not adversarial tricks.

What INQUISITOR Gives You A repeatable way to evaluate AI behaviour

A shared vocabulary for describing failures and instabilities

A lightweight workflow for everyday testing

A formal methodology for audit, governance, and reproducibility

A developer‑friendly approach that fits into real tasks, not lab conditions

INQUISITOR is built for people who need AI to behave predictably inside real systems, real teams, and real workflows.

Who INQUISITOR Is For Developers integrating AI into products

Engineers needing predictable behaviour

Analysts working with structured tasks

Researchers validating model behaviour

Product teams assessing reliability

Governance & risk functions needing evidence

Anyone using AI in real workflows

You don’t need expertise. You don’t need special prompts. You don’t need to run every test surface.

You only need to work normally and observe honestly.

What’s Included in This Repository Quick Start Guide

A five‑minute behavioural check. Perfect for fast evaluation.

Practitioner’s Guide

The everyday operational guide. Use this for real‑world testing.

Methodology (GitHub Edition)

The structured, formal framework for reproducible behavioural evaluation.

Licence

Defines usage and redistribution rights for this edition.

How to Use INQUISITOR Run the Quick Start to get a behavioural snapshot.

Use the Practitioner’s Guide for real‑world evaluation.

Apply the Methodology when you need structure, evidence, or audit‑grade documentation.

Follow the Licence for redistribution rules.

INQUISITOR scales from lightweight to formal depending on your needs.

Edition Notes This is the GitHub Edition:

free to share

free to redistribute

free for personal or team use

not permitted for commercial use without permission

Future editions may include:

Enterprise Edition

Full Methodology Edition

Audit & Compliance Edition

Contact For commercial licensing or permissions, contact the author directly.

About

Evaluating AI behaviour under real‑world work conditions to surface issues before they become problems. LLM INQUISITOR identifies failures (drift, instability etc) by observing AI during normal tasks — a tool the industry desperately needs to stem the 85% failure rate. Includes Quick Start, Practitioner’s...

LLM INQUISITOR: Evaluating how AI models handle long, realistic tasks

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast