Python implementation for text generation using EEG signals from the brain

jsdjzvdns1 pts0 comments

GitHub - VanshShah1/Thought2Text: Python implementation of the research paper on text generation from EEG signal large language models. · GitHub

/" data-turbo-transient="true" />

Skip to content

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Search

Clear

Search syntax tips

Provide feedback

--><br>We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

{{ message }}

VanshShah1

Thought2Text

Public

Notifications<br>You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit

History<br>4 Commits<br>4 Commits

"""Thought-to-text neural decoding pipel.py

"""Thought-to-text neural decoding pipel.py

CODEBASE_DIAGRAM.md

CODEBASE_DIAGRAM.md

README.md

README.md

Workflow.md

Workflow.md

thought2text.py

thought2text.py

View all files

Repository files navigation

Thought2Text

Thought2Text is a Python-based neural decoding pipeline designed for reproducing and experimenting with landmark brain-to-text systems. It supports both invasive intracortical speech decoding and non-invasive M/EEG-based typing reconstruction.

Overview

This project provides a modular framework for transforming neural signals into text. It implements a core workflow common to many state-of-the-art systems:<br>neural signal -> preprocessing -> time-aligned neural features -> neural sequence model -> token probabilities -> beam search + language model -> text

Key inspirations include:

Willett et al. 2023 (Nature): High-performance speech neuroprosthesis using RNN phoneme decoders with CTC.

Kunz et al. 2025 (Cell): Inner speech decoding with motor-intent gating and stack-gated RNNs.

Lévy et al. 2025 (Meta Brain2Qwerty): Non-invasive M/EEG-to-text decoding using convolutional transformers.

Features

Multi-modality Support: Handles intracortical spikes (RNN-based) and M/EEG sensor data (Transformer-based).

Synthetic Data Generators: Built-in smoke tests for both spikes and M/EEG to verify the pipeline without clinical data.

Preprocessing Utilities: Threshold-crossing detection, spike binning, Gaussian smoothing, and MNE-integrated EEG/MEG filtering.

CTC-based Decoding: Connectionist Temporal Classification (CTC) for handling variable-length neural-to-token sequences.

Beam Search & LM Rescoring: Prefix beam search with optional integration of HuggingFace causal LMs (e.g., GPT-2) for improved accuracy.

Privacy Gating: Scaffold for intent classification to distinguish between private thought and intended communication.

Streaming Scaffold: Architecture for online, chunked decoding of neural streams.

Installation

Prerequisites

Python 3.10 or 3.11

PyTorch (with CUDA for GPU acceleration)

Setup

# Clone the repository<br>git clone https://github.com/vanshshah/Thought2Text.git<br>cd Thought2Text

# Create and activate a virtual environment<br>python -m venv .venv<br>source .venv/bin/activate # On Windows: .venv\Scripts\activate

# Install core dependencies<br>pip install numpy scipy torch mne h5py jiwer editdistance transformers

Usage

Running Synthetic Smoke Tests

Verify the pipeline using generated data:

# Test intracortical spike decoding pipeline<br>python thought2text.py --mode synthetic --modality spikes

# Test M/EEG transformer decoding pipeline<br>python thought2text.py --mode synthetic --modality meeg

Command Line Arguments

--mode: synthetic (default) or real-placeholder.

--modality: spikes (default) or meeg.

--epochs: Number of training epochs (default: 2).

--batch-size: Training batch size (default: 16).

--input-dim: Dimension of neural input features.

--vocab-size: Size of the token vocabulary.

--device: Target device (e.g., cuda, cpu).

Codebase Structure

thought2text.py: The main executable module containing the entire pipeline.

Workflow.md: Detailed implementation plan and research background.

CODEBASE_DIAGRAM.md: Mermaid-based architectural overview.

Components

Component<br>Responsibility

NeuralDataLoader<br>Handles loading of NWB, MAT, HDF5, and MNE formats.

NeuralGRUCTC<br>Recurrent model for high-SNR intracortical features.

MEEGTransformerCTC<br>Conv-Transformer model for lower-SNR non-invasive signals.

CTCBeamSearchDecoder<br>Pure Python prefix beam search implementation.

HFTextScorer<br>Wrapper for using HuggingFace LMs for hypothesis...

thought2text neural decoding search python text

Related Articles