GitHub - Kaedim/perception-replay-ci: Replay-based regression testing for ROS 2 perception stacks. Same recorded /scan, two perception versions, automatic pass/fail. · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
Kaedim
perception-replay-ci
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
main
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>2 Commits<br>2 Commits
bags/golden_obstacle
bags/golden_obstacle
scripts
scripts
.gitignore
.gitignore
CLAUDE.md
CLAUDE.md
README.md
README.md
demo_spec.md
demo_spec.md
pixi.lock
pixi.lock
pixi.toml
pixi.toml
View all files
Repository files navigation
perception-replay-ci
A proof-of-concept Robot CI regression test. Replay a recorded /scan log through two versions of a perception stack (baseline vs. candidate) and automatically detect whether the candidate produces incorrect output.
The thesis: given the same recorded robot sensor log, can we replay it through two versions of a perception stack and automatically catch a regression in the candidate?
This is open-loop replay testing — the robot does not move during the test. The bag is the fixture, the perception nodes are the unit under test, and the comparator decides pass/fail. Full spec in demo_spec.md.
Stack
ROS 2 Humble via RoboStack (robostack-humble conda channel)
TurtleBot3 in Gazebo for capturing the test fixture
MCAP-format rosbags
Python perception nodes (rclpy)
pixi for environment + tasks
Currently osx-arm64 only (see [workspace] platforms in pixi.toml).
Install
pixi install
This pulls the full ROS Humble desktop, TurtleBot3 packages, and the MCAP storage plugin.
End-to-end demo
1. Capture the golden bag (one time)
Three shells:
pixi run sim # Gazebo + TurtleBot3 in turtlebot3_world<br>pixi run record-golden # records /scan /tf /tf_static /odom to bags/golden_obstacle/<br>pixi run control # teleop keyboard — drive toward obstacles
Drive forward, approach an obstacle until the front of /scan reads ~0.4–0.5 m, hold, back off, repeat for a second obstacle if you like. Ctrl-C the recorder first when done so the MCAP finalizes cleanly.
2. Replay the bag through each perception version
For each version, three shells:
pixi run baseline # or `candidate`<br>pixi run record-run runs/baseline.jsonl # or runs/candidate.jsonl<br>pixi run -- ros2 bag play bags/golden_obstacle
When the bag finishes, Ctrl-C the recorder (it'll log the line count).
3. Compare
pixi run compare
Exits 0 on PASS, 1 on FAIL — CI-friendly.
Example FAIL output:
Test: turtlebot3_laserscan_obstacle_regression<br>Result: FAIL
Reason:<br>- Candidate failed to detect obstacle for 16.2s across 4 window(s)<br>- Minimum observed distance during misses: 0.25m
Disagreement windows:<br>99.75s → 108.55s ( 8.80s, n= 45, min_range=0.31m): miss<br>123.15s → 127.55s ( 4.40s, n= 23, min_range=0.25m): miss<br>127.95s → 130.35s ( 2.40s, n= 13, min_range=0.25m): miss<br>131.15s → 131.75s ( 0.60s, n= 4, min_range=0.27m): miss
Recommendation:<br>Do not deploy candidate perception config.
How it works
Both perception nodes subscribe to /scan, find the minimum range in a forward ±30° wedge, and publish:
/obstacle/detected (std_msgs/Bool) — fired when min range /obstacle/range (sensor_msgs/Range, stamped with the scan's original timestamp)
The baseline uses a 0.50 m threshold; the candidate uses 0.25 m (the intentional defect). Both publish to the same topics — the workflow runs the bag twice, once per node, never simultaneously.
record_run.py taps both topics during a replay and writes one JSONL row per scan: {"t": , "detected": bool, "min_range": float}.
compare_runs.py reads two JSONL files, pairs rows by index (timestamps are identical across runs because both come from the same bag), groups disagreements into contiguous windows, classifies them as miss (baseline=true, candidate=false) or false_alarm (the inverse), and emits the pass/fail report.
Layout
scripts/<br>perception_baseline.py reference detector (threshold 0.50m)<br>perception_candidate.py intentionally broken (threshold...