Now serving H100 clusters in 12 regions

Train, deploy, scaleon demand.

Train models in the cloud on high-end GPUs, deploy them to fast inference endpoints, and let DeepRun autoscale as your traffic grows. No servers to manage, no cold starts.

01deeprun train ./llama-3-8b --gpu h100 --epochs 10
Provisioning 8x H100 cluster...
Epoch 01/10 █░░░░░░░░░ loss 1.842
Epoch 04/10 ████░░░░░░ loss 0.673
Epoch 07/10 ███████░░░ loss 0.214
Epoch 10/10 ██████████ loss 0.041
Training complete · checkpoint saved
02deeprun deploy ./checkpoints/llama-ft.pt --gpu h100
Optimizing graph kernels...
Endpoint active: https://api.deeprun.ai/v1/inference
03curl api.deeprun.ai/v1/run -d '{"prompt":"hi"}'
{ "output": "Hello from DeepRun" }

Train, deploy,
scale.

  1. 01

    Train in the cloud

    Push your code and spin up clusters of high-end GPUs in seconds. Train on A100s and H100s without managing a single server.

  2. 02

    Deploy & inference

    Ship any model to a production endpoint with one command. Serve low-latency inference with built-in load balancing.

  3. 03

    Autoscale up

    Traffic spikes? DeepRun scales replicas up automatically and back down to zero when it's quiet, so you only pay for what runs.

The hard parts, handled.

What used to take a platform team now takes a single command.

Without DeepRun

  • Hand-write Kubernetes YAML and node pools
  • Pay for idle GPUs 24/7
  • Patch drivers and CUDA versions yourself
  • Wait minutes for cold starts

With DeepRun

  • One CLI command provisions the cluster
  • Autoscale to zero, pay by the second
  • Pre-tuned images, kept up to date for you
  • Sub-50ms warm starts across regions

From notebook to production.

Everything you need to train, serve, and scale models, running on enterprise-grade GPUs.

High-end GPU training

Train on clusters of A100s and H100s that spin up in seconds. We handle the orchestration so you can focus on the model.

Fast inference, autoscaled

Deploy to production endpoints that scale replicas up under load and down to zero when idle. Low latency, anywhere.

Secure by default

SOC2 Type II compliant. Your data and model weights stay encrypted at rest and in transit on a zero-trust network.

Fine-Tuning Job #882

Active
Epoch 0/200Loss: 0.0000
VRAM Usage
0.0 GB
Throughput
0.0k t/s

Train in the
cloud

Spin up multi-GPU clusters of A100s and H100s in seconds. DeepRun handles the networking and orchestration across nodes, so you watch your loss curve instead of babysitting infrastructure.

Start training

Deploy & inference

Ship any Hugging Face model or custom container with one command. Your endpoint serves production traffic and autoscales replicas up and down on its own, so you never over-provision.

Deploy a model
# Initialize DeepRun Clientimport deeprunclient = deeprun.Client()# Deploy model to productionresponse = client.inference(  model="deeprun/llama-3-70b",  prompt="Write a system design...",  stream=True,  gpu="H100")// Output streaming at 420 tokens/sec

Current Usage

Live
$0.00/ mo

Billed for what you run — no minimums

GPU Compute412 H100-hrs
$0.00
Inference Requests1.8M calls
$0.00
Storage640 GB-mo
$0.00

Pay for what runs

We meter by the second across compute, requests, and storage, so your bill matches exactly what you used. When your endpoints scale to zero, you pay nothing. No idle charges, no surprises.

See pricing

Pricing that scales with you.

Start free, then pay only for the GPU time and requests you actually use.

Developer

$0/ mo
  • 1 concurrent job
  • Shared CPU
  • Community support
  • Free forever
Recommended

Pro

Usage-based
  • Unlimited concurrent jobs
  • Dedicated A100/H100 GPUs
  • Priority support
  • API access

Enterprise

Custom
  • SLA guarantees
  • Custom regions
  • Dedicated VPC
  • 24/7 technical lead

Questions, answered.

Everything you need to know before you deploy your first model.

You're metered by the second for GPU compute, per request for inference, and per GB-month for storage. There are no minimums or seat fees, and endpoints that scale to zero cost nothing.

Start building.

Join the next generation of AI companies building on the most reliable infrastructure in the world.