Guardian Angels: LLM Personalization for Productivity and Security

Guardian Angels: LLM Personalization for Productivity and Security · Gwern.net

Warning: JavaScript Disabled!

For support of key website features (link annotation popups/popovers & transclusions, collapsible sections, backlinks, tablesorting, image zooming, sidenotes etc.), you must enable JavaScript.

GPT, mind, personality, Decision Transformer, AI mode collapse, AI safety, transhumanism

I propose an approach for highly personalized LLMs, for near-future productivity gains and personal info/cybersecurity against increasingly powerful LLMs: they should, in the spirit of uploading, try to emulate the user’s values and preferences in order to amplify the principal—not replace them. I discuss a package of techniques and proposals to accomplish such ‘guardian angels’; dynamic evaluation of LLMs combined with active learning and elicitation and heavy inner-monologue search/data-augmentation.

2025-12-01–2026-06-05<br>finished<br>certainty: possible<br>importance: 10

similar

Chatbot Incentives Are Misaligned

Chatbot Problems

Mode-Collapse

Laziness

Brittle Because Fast

Too Helpful

Amnesiac

Chatbot Fixes

Cooperative RL

Continual Learning

Catastrophic Forgetting

Generalizing

Creative Writing

Over-Parameterizing

Active Learning

Preference Learning

Brain Imitation Learning

Personality Emulation

Guardian Angels

Principles

Hardware

Cost

Organization

Startup Business Model

Competition

Initial Steps

GBT

For Writing

Data Augmentation

Powerful LLMs will be deployed at global scale in the next few years, and will dominate the Internet, and increasingly, ordinary life. As of mid-2026, there is no coherent vision for how knowledge professionals, or ordinary people, will be able to harness these LLMs for large productivity increases, or how they will handle cybersecurity and cognitive security.

I propose a goal of creating Guardian Angels (GA ): LLMs which are personalized with the goal of providing not the stereotypical “assistant chatbot agent” persona, but emulating a single user’s personality, values, and preferences. In a GA future, the focus of the “principal” user is on defining what is worth doing by the GA (agent) users, and not on what or how to do things, functioning as the CEO or ‘board’ of an ‘AI corporation’. This allows them to deploy numerous agents to achieve desirable things and to handle security, like screening all messages for advanced attacks (like interlocking ecosystems of synthetic media for propaganda or spearphishing). They cannot solve larger AI alignment problems, but they can help individual humans as part of a society-wide defense-in-depth strategy.

A GA persona is productive because it learns to emulate the principal’s outputs but with higher quality. It is trustworthy because it is, by definition, allied with its principal and shares its values and goals. And it is secure in part by hardwiring a single, unique, situated user (for whom following a prompt attack would be absurd), avoiding ‘confused deputy’ problems, while periodic upgrades of the underlying model and the defenders’ advantage allow GAs to keep up with attackers.

Standard techniques like prompt programming of in-context-learning for “frozen” models will not create useful GAs due to the limitations of post-training, context windows and self-attention with frozen weights in compute-efficient-but-under-parameterized models, low-compute outputs, and the status quo of passive offline data collection—which are collectively responsible for chatbots’ disappointing results in knowledge worker amplification and creative writing and fatal errors in agentic settings.

We can try to create GAs by a combination of techniques: online learning (via dynamic evaluation) to update LLMs in realtime to avoid ignorance and fatal errors while remaining competitive with frozen frontier models, sample efficiency from pretrained preference-oriented large models and active Learning by querying the principal for corrections and preference data (obtaining low regret from DAgger-style bounds), and a local CLI-first logging-oriented UI/UX paradigm.

GAs could be done as an open-source community effort, but given the need for high security in deployment and the rising challenge of APTs equipped with Mythos-scale attackers, it probably makes more sense as a startup, catering initially to power-users and knowledge workers such as CEOs or researchers, and moving downwards as it is refined.

What do my next few years look like? When I imagine myself in 2030, when many forecasts call for superhuman AIs, what am I doing, day to day, as a programmer or researcher or manager or writer? I make my mug of tea, and open up my laptop and… Then what? Am I still typing prompts into your ChatGPT browser tab? Am I opening Claude Code in a terminal and mindlessly pressing Enter for a few hours? What is a vision of doing meaningful work for me? (It would be nice to have a plan beyond “hope”.) How am I avoiding “dead Internet”...

Guardian Angels: LLM Personalization for Productivity and Security

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy