Neuralwatt Cloud | Neuralwatt Cloud
Neuralwatt Cloud<br>powered by Neuralwatt Optimize<br>Learn more
Neuralwatt Cloud
Run Inference with Real Visibility<br>into Power, Cost, and Efficiency
The first AI inference API with energy-based pricing. Know exactly what your AI costs —<br>in dollars and kilowatt-hours.
Use Neuralwatt Cloud as a hosted service, or bring Neuralwatt Deploy into your own data center.
Get Started
Playground
Neuralwatt Deploy
$5/kWh
Energy-Based Pricing
100%
Energy Transparency
Median Time to First Token
4+
Models Available
Try it now
Send a prompt and see energy-aware inference in action.
Try It
Industry First
Inference Priced by Energy Consumed
Token-based pricing hides the true cost of AI inference. We're changing that.<br>Pay per kilowatt-hour and know exactly what resources your AI workloads consume.
Transparent
See energy consumption per request. No hidden costs, no opaque token multipliers.
Predictable
Energy costs are consistent. No surprises from model-specific pricing variations.
Efficient
Optimize your AI workloads. Compare energy efficiency across models and make informed decisions.
Compare energy vs token pricing
Why Neuralwatt?
Three pillars that define every layer of our platform.
Included Free
Energy Reporting
Every customer gets real-time energy metrics. Know exactly what your AI workloads consume.
Per-request energy metrics
Dashboard with usage trends
Model efficiency comparisons
Performance
State-of-the-art inference powered by vLLM with tensor parallelism, continuous batching, and advanced KV caching.
As low as 15ms time to first token
High throughput at scale
Multi-GPU tensor parallelism
Efficiency
More intelligence per kilowatt-hour. Optimized infrastructure for maximum compute efficiency.
40% more energy efficient
Energy-aware scheduling
Optimized GPU utilization
Multi-Model API
Access multiple LLMs through a single API. Switch models seamlessly without managing separate connections.
OpenAI Compatible
Drop-in replacement for OpenAI APIs. Just change your base URL and you're ready to go.
The Neuralwatt Platform
Three integrated capabilities for high-performance, energy-efficient AI — from the data center to the API.
Neuralwatt Cloud
YOU ARE HERE
Hosted Inference Service
The first AI inference service with energy-based pricing. OpenAI-compatible API with real-time energy transparency per request.
Neuralwatt Deploy
On-Premise Optimization
Bring Neuralwatt's energy optimization directly into your data center. Full control over your hardware, security, and power consumption.
Neuralwatt Optimize
Power Optimization Engine
Intelligent layer between AI workloads and GPUs that continuously tunes power consumption in real time with less than 0.1% performance overhead.
Learn more about Neuralwatt's full platform
Featured Models
Access the latest open-source models from leading providers.<br>All with OpenAI-compatible APIs.
Reasoning
GLM-5.2
ZhipuAI
Context<br>1048K
Input<br>$0.0014/1K
Output<br>$0.0045/1K
Tools
Try Now
GLM-5.2 (fast)
ZhipuAI
Context<br>1048K
Input<br>$0.0014/1K
Output<br>$0.0045/1K
Tools
Try Now
Reasoning
GLM-5.2 (short)
ZhipuAI
Context<br>200K
Input<br>$0.0014/1K
Output<br>$0.0045/1K
Tools
Try Now
GLM-5.2 (short, fast)
ZhipuAI
Context<br>200K
Input<br>$0.0014/1K
Output<br>$0.0045/1K
Tools
Try Now
Coming Soon
Devstral 2 123B
Mistral
Context<br>131K
Input<br>$0.0002/1K
Output<br>$0.0005/1K
Tools
JSON
Request Access
Coming Soon
Devstral Small
Mistral
Context<br>131K
Input<br>$0.0001/1K
Output<br>$0.0002/1K
Tools
JSON
Request Access
Coming Soon
Gemma 4 31B
Context<br>256K
Input<br>$0.0000/1K
Output<br>$0.0000/1K
Tools
Request Access
Coming Soon
GPT-OSS 120B
OpenAI
Context<br>131K
Input<br>$0.0001/1K
Output<br>$0.0003/1K
Reasoning
Tools
JSON
Request Access
Coming Soon
Nemotron 3 Super 120B
NVIDIA
Context<br>1000K
Input<br>$0.0000/1K
Output<br>$0.0000/1K
Tools
Request Access
Coming Soon
Nemotron 3 Ultra
NVIDIA
Context<br>1000K
Input<br>$0.0040/1K
Output<br>$0.0120/1K
Tools
JSON
Request Access
Coming Soon
Qwen3.5 122B
Qwen
Context<br>128K
Input<br>$0.0000/1K
Output<br>$0.0000/1K
Request Access
Coming Soon
Qwen3.5 27B (FP8)
Qwen
Context<br>131K
Input<br>$0.0000/1K
Output<br>$0.0000/1K
Request Access
View all models
Start with Energy-Transparent AI
Get started with $5 in free credits. Pay per kWh or per token — your choice.<br>Real-time energy reporting included with every account.
Try the Playground
Create Free Account
Enterprise & Dedicated Inference
Need dedicated GPU capacity, custom SLAs, or on-premises deployment?<br>Our enterprise solutions offer guaranteed performance with full energy transparency.
Dedicated GPU infrastructure
SLA guarantees up to 99.9%
Volume pricing & custom models
Contact Enterprise Sales