Python implementation for text generation using EEG signals from the brain

GitHub - VanshShah1/Thought2Text: Python implementation of the research paper on text generation from EEG signal large language models. · GitHub

/" data-turbo-transient="true" />

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Clear

Search syntax tips

Provide feedback

--> We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

/;ref_cta:Sign up;ref_loc:header logged out"}" Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

VanshShah1

Thought2Text

Public

Notifications You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files NameNameLast commit message Last commit date Latest commit

History 4 Commits 4 Commits

"""Thought-to-text neural decoding pipel.py

CODEBASE_DIAGRAM.md

README.md

Workflow.md

thought2text.py

View all files

Repository files navigation

Thought2Text

Thought2Text is a Python-based neural decoding pipeline designed for reproducing and experimenting with landmark brain-to-text systems. It supports both invasive intracortical speech decoding and non-invasive M/EEG-based typing reconstruction.

Overview

This project provides a modular framework for transforming neural signals into text. It implements a core workflow common to many state-of-the-art systems: neural signal -> preprocessing -> time-aligned neural features -> neural sequence model -> token probabilities -> beam search + language model -> text

Key inspirations include:

Willett et al. 2023 (Nature): High-performance speech neuroprosthesis using RNN phoneme decoders with CTC.

Kunz et al. 2025 (Cell): Inner speech decoding with motor-intent gating and stack-gated RNNs.

Lévy et al. 2025 (Meta Brain2Qwerty): Non-invasive M/EEG-to-text decoding using convolutional transformers.

Features

Multi-modality Support: Handles intracortical spikes (RNN-based) and M/EEG sensor data (Transformer-based).

Synthetic Data Generators: Built-in smoke tests for both spikes and M/EEG to verify the pipeline without clinical data.

Preprocessing Utilities: Threshold-crossing detection, spike binning, Gaussian smoothing, and MNE-integrated EEG/MEG filtering.

CTC-based Decoding: Connectionist Temporal Classification (CTC) for handling variable-length neural-to-token sequences.

Beam Search & LM Rescoring: Prefix beam search with optional integration of HuggingFace causal LMs (e.g., GPT-2) for improved accuracy.

Privacy Gating: Scaffold for intent classification to distinguish between private thought and intended communication.

Streaming Scaffold: Architecture for online, chunked decoding of neural streams.

Installation

Prerequisites

Python 3.10 or 3.11

PyTorch (with CUDA for GPU acceleration)

Setup

# Clone the repository git clone https://github.com/vanshshah/Thought2Text.git cd Thought2Text

# Create and activate a virtual environment python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate

# Install core dependencies pip install numpy scipy torch mne h5py jiwer editdistance transformers

Usage

Running Synthetic Smoke Tests

Verify the pipeline using generated data:

# Test intracortical spike decoding pipeline python thought2text.py --mode synthetic --modality spikes

# Test M/EEG transformer decoding pipeline python thought2text.py --mode synthetic --modality meeg

Command Line Arguments

--mode: synthetic (default) or real-placeholder.

--modality: spikes (default) or meeg.

--epochs: Number of training epochs (default: 2).

--batch-size: Training batch size (default: 16).

--input-dim: Dimension of neural input features.

--vocab-size: Size of the token vocabulary.

--device: Target device (e.g., cuda, cpu).

Codebase Structure

thought2text.py: The main executable module containing the entire pipeline.

Workflow.md: Detailed implementation plan and research background.

CODEBASE_DIAGRAM.md: Mermaid-based architectural overview.

Components

Component Responsibility

NeuralDataLoader Handles loading of NWB, MAT, HDF5, and MNE formats.

NeuralGRUCTC Recurrent model for high-SNR intracortical features.

MEEGTransformerCTC Conv-Transformer model for lower-SNR non-invasive signals.

CTCBeamSearchDecoder Pure Python prefix beam search implementation.

HFTextScorer Wrapper for using HuggingFace LMs for hypothesis...

Python implementation for text generation using EEG signals from the brain

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast