LLM temporal and causal reasoning research

GitHub - krellixlabs/llm-reasoning-research: Curated, annotated research on reasoning gaps in large language models — temporal reasoning, causal reasoning, and beyond. · GitHub

/" data-turbo-transient="true" />

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Clear

Search syntax tips

Provide feedback

--> We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

/;ref_cta:Sign up;ref_loc:header logged out"}" Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

krellixlabs

llm-reasoning-research

Public

Notifications You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files NameNameLast commit message Last commit date Latest commit

History 7 Commits 7 Commits

causal-reasoning

temporal-reasoning

CONTRIBUTING.md

LICENSE

README.md

View all files

Repository files navigation

LLM Reasoning Research

A curated, annotated research collection on reasoning gaps in large language models.

This repository tracks active research on the cognitive limitations of large language models — the places where current LLMs struggle, why those gaps matter, and what's being done about them.

It is maintained by the team at Krellix as a public resource for developers, researchers, and product teams building with LLMs.

Why this exists

Large language models have advanced rapidly, but several core reasoning capabilities remain unsolved or actively contested. These gaps matter — for builders deciding what to ship, for researchers deciding what to work on, and for users trying to understand what LLMs can and cannot reliably do.

Most of the literature is scattered across arXiv, conference proceedings, and lab blog posts. This repo is an attempt to organize the most important work in one place, with honest annotations about what each paper contributes and what it leaves open.

This is curation, not original research. We are practitioners building an AI product. We read this work to understand the landscape we're operating in. This collection is a byproduct of that work, shared publicly because we think it's useful.

What's covered

Currently in the collection

Temporal Reasoning — How LLMs handle time, sequence, duration, and temporal context. Why models that can write a sonnet often fail at "what happened before what."

Causal Reasoning — The gap between correlation and causation in LLM outputs. Why current models struggle with cause-and-effect reasoning, and what the research community is doing about it.

Coming soon

We're expanding this collection over time. Topics on the roadmap:

Mathematical and logical reasoning

Planning and multi-step problem solving

Theory of mind and social reasoning

World models and physical reasoning

Counterfactual reasoning

Long-horizon coherence

If you'd like to suggest a topic or contribute papers to existing sections, see CONTRIBUTING.md.

How this collection is organized

Each topic folder contains:

README.md — An accessible introduction to the reasoning gap, why it matters, and where the research currently stands

foundational-papers.md — The core papers that defined the problem

recent-research.md — Recent work (last ~24 months)

benchmarks-and-datasets.md — How researchers measure progress on the problem

practical-implications.md — What this means for people building with LLMs

Every paper entry follows a consistent format:

### [Paper Title] **Authors** · **Year** · **Venue** Links · TL;DR · Why it matters · Key insight · Limitations

The goal is for each entry to be useful in 60 seconds, with enough signal to decide whether to read the full paper.

How to use this repo

If you're a developer building with LLMs — Start with the practical-implications.md file in any topic. It translates research into things you can act on.

If you're a researcher — The foundational-papers.md and recent-research.md files give you a structured reading path. The benchmarks file points to where evaluation is happening.

If you're a product team — Read the topic READMEs first. They explain the gaps in language that doesn't require an ML background.

Contributing

This repo improves with community input. We welcome:

Suggestions for papers we've missed

Corrections to existing annotations

New topic proposals (see roadmap above)

Better organization of existing...

LLM temporal and causal reasoning research

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast