Personal-Values Alignment Tech: Some Initial Motivations

Daniel Sosebee

SubscribeSign in

Personal-Values Alignment Tech: Some Initial Motivations AIs don't understand our individual values. We should change that.

Daniel Sosebee Jun 11, 2026

Dora Louise Murdoch, Parmelee Garden, c. 1920

In We’re Arguing About AI Safety Wrong, Helen Toner puts out a call for “dynamist vision[s] for safe superhuman AI” — visions of the future in which AI contributes to individual autonomy, allowing society to self-organize and rapidly adapt to future challenges. She presents dynamism in opposition to stasism (both terms from Virginia Postrel’s The Future and Its Enemies), with stasism being the approach to governance that favors top-down control. Toner asserts that many in the AI-alignment community once advanced stasist agendas (and that many still do), but that discourse has “shifted somewhat in [the] direction” of dynamism, citing writings such as a LessWrong post titled (and I quote) Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt. I’m generally on board with dynamism, but what might an AI-empowered dynamist future look like in practice? And what kinds of tech/policy might support the evolution of a yet more patchy and polychromatic society? Co-author to that LessWrong post Seb Krier, who leads policy development and strategy at Google DeepMind, laid out one partial vision in his September 2025 essay Coasean Bargaining at Scale: […] consider AGI deployed as a vast ecology of personalized agents and systems. This emerging ecosystem is what Tomašev et al. (2025) characterize as the “virtual agent economy,” a new economic layer where agents transact and coordinate at scales and speeds beyond direct human oversight. While this ecology will contain countless specialized agents, let’s focus on the one that matters most from an individual’s perspective: your personal advocate. Think of it as a fiduciary extension of yourself: a tireless, extremely competent digital representative, closely tied to you, its principal. What could such an agent do? In principle, it can negotiate, calculate, compare, coordinate, verify, monitor, and much more in a split second. Through many multi-turn conversations, tweaking knobs and sliders, and continuous learning, it could also develop an increasingly sophisticated (though never perfect) model of who you are, your preferences, personal circumstances, values, resources, and more. This should evolve over time - an agent’s alignment should follow the principal’s own evolution.

Krier spends much of the essay discussing certain bargaining scenarios, before returning to this core technical challenge: The user should have immense freedom to tune their agent to their unique preferences and values. They should also have complete privacy and control over their “cognitive profile” developed by the agent, for obvious reasons. Practically speaking though, this is the hardest part: how do you design and evidence an agent (mostly) aligned to a user?

I might call this necessary piece of software a “values profile” and not a “cognitive profile,” as the former more strictly describes what’s required of it (to speak on behalf of the user’s values). In any case, Krier doesn’t offer a solution and rather just points to some theoretical research on individual alignment preferences. And indeed this problem of aligning AIs to individual values remains largely unsolved! So I’d argue that this deserves more attention. While I’ve backed my way into this motivation statement by starting from a “dynamist vision for safe superhuman AI,” one does not need to be AI-safety-concerned to recognize the value here. Indeed basically every AI application, from coding to personal projects, would straightforwardly benefit from this tech, since basically every AI use-case exists in the complex world of the user, and thus is ultimately judged in terms of alignment to their values (and to the values of organizations in which they participate). If built, AI agents that align well to individual values would be a win-win technology, existing in the sweet spot where alignment and capabilities overlap. So how might an automated advocate be built? High-level components of an automated personal advocate

This section is not a concrete proposal; rather a framework for discussing various approaches When designing automated advocates we can leverage two methods of personalization: Context engineering - influencing actions via providing value-related context that steers the AI towards the correct actions.

Training - influencing actions via training on data related to the user’s values. This could take many forms, for example, deliberative alignment where the specification used is user-specific.

An advocate would need to be paired with an advocate-maintenance system — a system which is responsible for efficiently bringing the advocate into sync with the user’s values, and...

Personal-Values Alignment Tech: Some Initial Motivations

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs