Anti-slopping: An innovation for rectifying LLM writing clichés

freeatnet1 pts0 comments

Anti-slopping — An innovation for rectifying LLM writing clichés | Thoughtworks Research

Enable javascript in your browser for better experience. Need to know to enable it?<br>Go here.

Back

Back

Anti-slopping: An innovation for rectifying LLM writing clichés

By

Allen Roush

and

Parag Mahajani

Published: June 08, 2026

Introduction

Today’s LLMs suffer from “slop”, which we define as the repetitive, predictable and robotic text output that makes the text not only of inferior quality but also recognizable and bland. Suppressing such patterns appears an obvious solution but it is not. It further damages the output by killing other useful words, e.g. banning a word “indigestible” will also ban the words like “in”, and “digest” making the text meaningless. Suppression may also introduce a “backfire effect” — if we forbid a term or concept while prompting, the model can accidentally prioritize it by talking about it more.

This post (based on original research) presents a solution to this problem, by achieving 90% “slop reduction” from its outputs. We have provided a framework that combines three innovations to detect and replace the overused patterns.  The framework can eliminate 8000 patterns without degrading the output. The framework includes,

The anti-slop sampler that uses back-tracking for the repetitive words and forces the model to select a new and more human-like word.

An automated pipeline that identifies "slop" by calculating the frequency ratio of words and phrases in the model output versus human baselines. We have used two baselines; the first is wordfreq library and the second is a human-written text corpus from Reddit and Project Gutenberg.

Final token-preference optimization (FTPO) is a training algorithm that looks at the exact token where the model has chosen a “sloppy” word. It then adjusts only those specific logits by implementing many “soft-touch” mechanisms along the way.

Fig 1. Anti slopping pipeline

Current strategies

Below is a list of current strategies that can reduce repetitive patterns or slop.However, while each strategy possesses unique features, they all have their own shortcomings. Here is a table:

No.<br>Strategies<br>Shortcomings

Top-k, top-p, and min-p

Don’t address repetitive tendencies in coherent outputs.

RLHF

Slow and less productive.

Exclude top choices (XTC)

Targets only high probability tokens.

Don’t repeat yourself (DRY)

Can’t identify statistically emerging repetitive patterns.

String banning feature of ExLlama

Hard-bans a provided set of strings at inference time.

Beam search

Exclude forbidden words or phrases by beam pruning.

Direct preference optimization (DPO)

Lowers the likelihood of preferred responses, inducing diversity collapse and reducing syntactic and n-gram variety in outputs.

Expand table

Collapse table

Slop analysis

Every model is different, having some inherent tendencies; we wanted to explore what those are in more depth. To do this, we collected overly used words and n-grams that are responsible for producing slop patterns using the frequency ratio. An n-gram is a sequence of n consecutive items (usually words, sometimes characters) from the text. The frequency ratio was calculated based on the following formula:

\[ \rho(\mathbf{p}) = \frac{f_{\text{LLM}}(\mathbf{p})}{f_{\text{human}}(\mathbf{p})} \]

\[ f_{\text{LLM}}(\mathbf{p}) \text{ - frequencies of pattern } (\mathbf{p}) \text{ in LLM} \]

\[ f_{\text{human}}(\mathbf{p}) \text{ - frequencies of pattern } (\mathbf{p}) \text{ in human corpora} \]

We generated more than 2000 outputs using creative writing prompts from Reddit and discovered various overrepresentations in different language models.

The innovative sampler

The job of a sampler in LLMs is to select the next token from the probability distribution. The sampler controls creativity, diversity and the coherence of the model’s output.

Our innovative sampler triggers only after the entire sequence of words appears in the inference trace. It works as follows:

As the model generates the output for certain input, our algorithm keeps the trace of tokens and logit distributions. It scans for unwanted patterns after each token. Once such a pattern is detected, instead of suppressing it the algorithm backtracks to its origin and lowers the initiating probability by the configurable ban-strength parameter “s” defined as:

\[ \mathbf{p}_{\text{new}} = \mathbf{p}_{\text{old}} \times 10^{-10s} \]

\[ \text{Where, } 0 \leq s \leq 1.0 \]

The next step is to use the min-p filtering to constrain the adjusted distribution. This step selects the coherent candidates who meet the predefined probability threshold. Our anti-slop backtracking algorithm is as follows:

Fig:2 Anti-slop backtracking algorithm

The final step is soft banning, where we allow only those banned patterns through the forward pass having high probability distribution. The mechanism provides incremental control through the ban-streangth parameter “s”. When...

text slop mathbf patterns words anti

Related Articles