The Explore-Exploit Dilemma in Media Consumption (2016)

The Explore-Exploit Dilemma in Media Consumption · Gwern.net

Warning: JavaScript Disabled!

For support of key website features (link annotation popups/popovers & transclusions, collapsible sections, backlinks, tablesorting, image zooming, sidenotes etc.), you must enable JavaScript.

psychology, Bayes, decision theory, order statistics

How much should we rewatch our favorite movies (media) vs keep trying new movies? Most spend most viewing time on new movies, which is unlikely to be good. I suggest an explicit Bayesian model of imprecise ratings + enjoyment recovering over time for Thompson sampling over movie watch choices.

2016-12-24–2019-04-14 notes certainty: possible importance: 5 backlinks similar bibliography

Decay Period

External Links

When you decide to watch a movie, it can be tough to pick. Do you pick a new movie or a classic you watched before & liked? If the former, how do you pick from all the thousands of plausible unwatched candidate movies? Since we forget, if the former, how soon is too soon to rewatch? And, if we forget, doesn’t that imply that there is, for each individual, a ‘perpetual library’—a sufficiently large but finite number of items such that one has forgotten the first item by the time one reaches the last item, and can begin again?

I tend to default to a new movie, reasoning that I might really like it and discover a new classic to add to my library. Once in a while, I rewatch some movie I really liked, and I like it almost as much as the first time, and I think to myself, “why did I wait 15 years to rewatch this, why didn’t I watch this last week instead of movie X which was mediocre, or Y before that which was crap? I’d forgotten most of the details, and it wasn’t boring at all! I should rewatch movies more often.” (Then of course I don’t because I think “I should watch Z to see if I like it…”) Maybe many other people do this too, judging from how often I see people mentioning watching a new movie and how rare it is for someone to mention rewatching a movie; it seems like people predominantly (maybe 80%+ of the time) watch new movies rather than rewatch a favorite. (Some, like Pauline Kael, refuse to ever rewatch movies, and people who rewatch a film more than 2 or 3 times come off as eccentric or true fans.) In other areas of media, we do seem to balance exploration and exploitation more - people often reread a favorite novel like a Harry Potter novel and everyone relistens their favorite music countless times (perhaps too many times) - so perhaps there is something about movies & TV series which biases us away from rewatches which we ought to counteract with a more mindful approach to our choices. In general, I’m not confident I come near the optimal balance, whether it be exploring movies or music or anime or tea.

The tricky thing is that each watch of a movie decreases the value of another watch (diminishing marginal value), but in a time-dependent way: 1 day is usually much too short and the value may even be negative, but 1 decade may be too long - the movie’s entertainment value ‘recovers’ slowly and smoothly over time, like an exponential curve.

This sounds like a classic reinforcement learning (RL) exploration-exploitation tradeoff problem: we don’t want to watch only new movies, because the average new movie is mediocre, but if we watch only known-good movies, then we miss out on all the good movies we haven’t seen and fatigue may make watching the known-good ones downright unpleasant.

In the language of optimal foraging theory (see ch4 of Foraging Theory, Stephens & Krebs 198640ya), we face a sequentially-dependent sampling patch problem - where the payoff of each patch can be estimated only by sampling each patch (before letting it recover) and where our choices will affect future choices; the usual marginal value theorem is of little help because we exhaust a ‘patch’ (each movie) before we know how we like it (as we can safely assume that no movie is so good that rewatching it twice in a row is superior to watching all other possible movies), and even if we could know, the marginal value theorem is known to over-exploit in situations of uncertainty because it ignores the fact that we are buying information for future decisions and not myopically greedily maximizing the next time-step’s return. Unfortunately, this is one of the hardest and thus least studied foraging problems, and Stephens & Krebs 198640ya provides no easy answers (other than to note the applicability of POMDP-solving methods using DP which is, however, usually infeasible).

One could imagine some simple heuristics, such as setting a cutoff for ‘good’ movies and then alternate between watching whatever new movie sounds the best (and adding it to the good list if it is better than the cutoff) and watching the oldest unwatched good movie. This seems suboptimal because in a typical RL problem, exploration will decrease over time as most of the good decisions...

The Explore-Exploit Dilemma in Media Consumption (2016)

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast