Privacy-Preserving Process Mining

tosh1 pts0 comments

Privacy-Preserving Process Mining – HASH Blog

Blog<br>Company News<br>,AI

Privacy-Preserving Process Mining<br>Collecting event traces from employee's desktop activity in a privacy-first way<br>May 27th, 2026<br>Dei Vilkinsons<br>CEO & Founder

HASH's desktop task/process mining agent is only available to enterprise<br>customers who have agreed to additional privacy and security safeguards.<br>This post explores the technical measures we take to protect end-users beyond<br>these contractual requirements placed on firms deploying our technology.

Why process mining matters

Process mining is the closest thing most organizations have to ground truth about how they actually operate. Where interviews, workshops and process diagrams capture the idealized version of a workflow, mined event logs capture what people and systems actually do — including the workarounds, exception paths, rework loops, and informal handoffs that determine whether a process succeeds or fails.

That ground truth is the foundation for almost everything we care about at HASH. It feeds the decision intelligence work we're doing in safety-critical domains; it grounds the agent-based simulations and digital twins of enterprises that are built on HASH; it provides conformance signal for the Petri net-based orchestration of agentic workflows; and — at scale — it's the raw material from which a process foundation model might one day learn the latent state of real organizations. Without good event data, all of these projects are working from a sketch rather than a photograph.

Why process mining sucks

Despite this, adoption of process mining inside many of the organizations that would benefit from it the most remains patchy. Process mining technology has existed in commercial form for well over a decade, and yet many of the firms we work with either don't use it at all, or only employ it on a narrow slice of their ERP-resident workflows where events are generally already centralized.

In our own research with biopharmaceutical supply-chain leaders, the single most consistent objection to outside AI tooling was opacity around data security, privacy and governance. Practitioners told us they routinely rejected solutions that required sensitive operational data to be shipped to an external cloud — not because they believed vendors untrustworthy, but because the certification and procurement burden was prohibitive, and the consequences of getting it wrong were severe. The same dynamic plays out across every regulated industry we've spoken to: life sciences, financial services, defence, healthcare, and critical infrastructure.

The constraints aren't only commercial. In the EU, comprehensive workplace monitoring runs straight into the GDPR's data-minimization principle and into the co-determination rights of works councils, both of which require a credible answer to "what exactly are you collecting, on whom, and why?" before deployment is even possible. Most incumbent process-mining tools — and the newer wave of agent-monitoring and "employee productivity" stacks — were architected from the opposite direction: capture everything that moves, ship it to a SaaS backend, and figure out the governance story afterwards. That posture is a non-starter in any environment where the cost of a mistake is measured in patient harm, regulatory exposure, or breach of trust.

The result is that the organizations who would derive the most value from honest, end-to-end process visibility are precisely the ones who cannot deploy the tools that currently provide it.

A privacy-first architecture

Utilizing the provenance-aware data engine at the heart of HASH, we've been exploring what an alternative might look like: a desktop agent that collects rich event traces from the applications and websites users interact with, while making it structurally difficult — and in many cases impossible — to abuse.

The agent is built around a small number of architectural commitments. None of them is novel on its own, but combining all of them in one tool is, as far as we're aware, new.

1On user device<br>Raw signals captured

Screen content, window titles, and keystrokes are observed locally. Whitelist/blacklist and time-of-day rules govern capture in the first place.

2On user device<br>Typed semantic extraction

An on-device model converts raw signals into typed events (“invoice opened”, “approval submitted”). Raw pixels and keystrokes never leave the device.

3On user device<br>Optional on-device PII strip

Off by default. Names, emails, and internal identifiers can be scrubbed before transmission for organizations under stricter privacy regimes.

4On user device<br>User-redactable log

Every event is visible to the subject before leaving their device. Fields or whole events can be redacted, leaving an auditable placeholder.

5In transit<br>Encrypted transmission

Only typed events — never raw recordings — cross the network, straight into the user or their org's HASH web.

6Within HASH instance<br>Provenance-aware HASH...

process mining privacy hash device from

Related Articles