MFM: PINN based Motion Foundation Model

JuSeongvin/pinn · Hugging Face

PINN-JEPA — Physics-Informed Encoder for 3D Human Motion (Step 1)

A physics-informed neural network (PINN) encoder for 3D skeletal human motion . The encoder is pretrained in a self-supervised way to produce motion representations whose kinematic structure (position → velocity → acceleration → jerk) and bone geometry stay physically consistent. This repository releases the Step 1 PINN pretraining stage (encoder body + pretraining head) and a non-latest checkpoint.

Status: research release. The published weights are not the latest internal checkpoint, and are intended for reproducibility and experimentation, not production.

Model overview

Input (B, T, J, 12) — per joint state [p(3), v(3), a(3), j(3)]

Skeleton H36M 17-joint topology (J = 17)

Output token features (B, T, J, D) + reconstructed state s_hat (B, T, J, 12)

Backbone State embedding → (GraphMix spatial + TemporalBlock) × depth → LayerNorm

Default size d_model = 256, depth = 6, d_state = 64

Framework PyTorch (custom modules, no transformers dependency)

The encoder predicts a residual on position only; velocity, acceleration and jerk are derived analytically via central differences , which is what keeps the representation kinematically consistent rather than letting each channel drift independently.

Training objective (Step 1)

Self-supervised state reconstruction combined with physics-aware regularizers:

State reconstruction on p / v / a / j (weighted)

Bone-length consistency over skeleton edges

Kinematic consistency (finite-difference agreement between channels)

Jerk regularization for motion smoothness

See PINN_Lossfunction.py for exact terms and default weights.

Repository contents

PINN_EncoderBody.py # backbone (StateEmbedding, GraphMix, TemporalBlock, EncoderBody) PINN_PretrainModel.py # Step 1 model: encoder + residual-p head -> s_hat PINN_Lossfunction.py # physics-aware pretraining losses PINN_Training.py # train step, checkpoint save/load PINN_ModelEvaluation_downstream.py # representation-quality eval (clustering) PINN_ModelEvaluation_itself4.py # model self-evaluation PINN_visualization_for_model3.py # 3D skeleton render / input-vs-output compare Utils.py # skeleton edges, central_diff, masked_mean, etc. config.json # architecture hyperparameters (edit to match the checkpoint) export_weights.py # slim a training checkpoint -> release weights inference_example.py # minimal load + forward example

Note on imports. Modules use flat imports (from Utils import ...). Keep all .py files at the repository root, or add the repo root to PYTHONPATH, before importing.

Usage

import json, torch from PINN_EncoderBody import EncoderBody from PINN_PretrainModel import PINNPretrainModel

cfg = json.load(open("config.json")) encoder = EncoderBody(**cfg["encoder"]) model = PINNPretrainModel(encoder=encoder, fps=cfg["fps"])

state = torch.load("pytorch_model.bin", map_location="cpu") model.load_state_dict(state) model.eval()

# x: (B, T, J=17, 12) = [p, v, a, j] per joint out = model(x) features = out["token_feat"] # (B, T, J, D) representation s_hat = out["s_hat"] # (B, T, J, 12) reconstructed state

config.json ships with the architecture defaults . If the released checkpoint was trained with different settings, edit config.json so the shapes match before loading.

Intended use

Self-supervised motion representation learning research

Feature extraction for downstream pose/motion tasks

Studying physics-informed regularization for skeletal motion

Out of scope

Not a clinical, diagnostic, biometric, or safety-critical tool

Not trained or validated for person identification or surveillance

Tuned for the H36M 17-joint topology; other skeletons need adaptation/retraining

Limitations

Released weights are an older checkpoint and may underperform the internal latest version.

Assumes a fixed 17-joint topology and a consistent (p, v, a, j) input layout.

fps at inference should match the value used to build the (v, a, j) channels.

Evaluation utilities depend on scikit-learn; UMAP is optional.

License

This release is distributed under the Academic Free License v3.0 (AFL-3.0) .

Source code (*.py): AFL-3.0 — see LICENSE.

Model weights (released checkpoint): AFL-3.0, with the disclaimer below.

The weights are provided "as is", for research and reproducibility, without warranty of any kind. They are not the latest internal checkpoint and carry no fitness guarantee for any particular use. See NOTICE for the scope split between code and weights.

Citation

@misc{pinn_jepa_pose, title = {PINN-JEPA: Physics-Informed Encoder for 3D Human Motion}, author = {}, year = {2026}, note = {Research code and weights, AFL-3.0}, howpublished = {\url{https://huggingface.co//}}

Downloads last month 30

Inference Providers NEW This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

MFM: PINN based Motion Foundation Model

Related Articles

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

How to Earn a Billion Dollars

Italy's Meloni says Trump 'made up' story that she 'begged' him for photo at G7