Fable in a Data Analyics Harness

robertclaus1 pts0 comments

A First Look at Fable · Chris Parmer

Anthropic's latest model was released today. Here's a first glance on how it performs and what it feels like in Plotly Studio. We'll have a more detailed post on this once we put it through the full set of benchmarks but for now here are our field notes.

We ran through 5 real world, personal, and relatable use cases that allow for a wide range of analytical depth.

1. Housing Rent or Buy Analysis with FRED Data

2. SF Mayor Effectiveness via 311 Data

3. Workout Impact Analysis from Apple Health - See our previous deep dive on this analysis.

4. SF Water Temp Buoy Dashboard App - Examining the water temp in San Francisco for open water swimming

5. Iran Conflict Economic Impact via FRED

A few of the sessions we ran through Fable tonight in Plotly Studio

Remarkable World Knowledge

Others have commented how this model "feels big" in its world knowledge.

We noticed this right away where every step of the analysis presents much more detailed contextual information about the data.

For example, it presents station information about each buoy with more detailed location information.

Fable presents much more detailed "world knowledge" about the data that it analyses.

I've written previously about LLM's curious and remarkable knowledge of public datasets, and this release is no different. Zillow Research data is a new dataset that I haven't seen any model surface before for this type of analysis.

Pauses Appropriately

Fable is positioned as a highly autonomous model that can work for hours or days at a time. From it's knowledge card (emphasis mine):

Claude Fable 5 [...] It is suited for long-running, complex, and asynchronous tasks that previously required frequent human check-ins .

It is particularly strong at end-to-end work that would otherwise take a person hours, days, or weeks - taking on problems that are long-running, ambiguous , or highly multi-step. It executes well-scoped tasks with few mistakes, automatically self-correcting through verification loops, and ships with robust safeguards.

In data work, we want our agentic loop to recognize ambiguity and pause and ask the operator for clarification if the ambiguity is consequential rather than just plough ahead. See more in our essay about designing agentic analytic benchmarks. I have been concerned that the over-emphasis on autonomy and long-running tasks would end up being in conflict to this behavior.

However, Fable handles ambiguity with grace and curiosity within Plotly Studio's agentic loop. It raises good questions and seeks clarification with appropriate context under the scenarios we presented.

A question that Plotly Studio raised while running the Fable model when analyzing the Apple Health data. Great example of the model seeking clarification and providing good context.

Autonomy and Long Horizon Tasks

By default, the model still appears to work up to about 10 steps (about 15 minutes) for any given data analytics task. This is a reasonable behavior as a default as it prevents cost overruns or unnecessary depth of analysis.

It can be steered to work for longer horizon analytics tasks if you tell it to work longer in open exploration or if you give it a more detailed specification of the analytics task.

However, I find that it is still difficult for it to really go into open exploration on its own where it might come up with new questions and ideas as it works or follow different rabbit holes - as is common with exploratory data work. I suspect that this is RL trained behavior to prevent the loops from going "off the rails".

Solid visualizations with room for improvement

I was delighted to see it build out these physiological subplots similar to what we designed by hand when working through the Apple Health data.

It does a better job at handling labels than I've seen previously as well.

Subplots with shared x-axis

But some of the charts it creates are pretty dense by default, and difficult to interpret at a glance:

But it takes direction well. I asked it to update the chart to display rolling averages and subplots instead of fixed aggregations and stacked bars and it had no problem:

It defaults to clean line and bar charts, but if you ask it for a wider range of charts it does well across the Plotly visualization stack:

The Dash apps are very nice and clean as well:

The first shot of graphs, reports, and graphs is remarkably good across the board. And it's also not uncommon to still see a few visual quirks here and there that need to be followed up upon, like large number formatting issues:

Stronger Analysis

The analysis was notably stronger across the board. It handled time series and lagging correlations better. In the Apple Health analysis, it correctly identified the different regimes of training and the hiking-to-VO2 max correlation that we found when steering the sessions in Plotly Studio more manually.

It does a better job at highlighting some of the core assumptions in the...

data analysis fable model work plotly

Related Articles