BYO-GPU — single VM happy path

This guide gets you from zero to Keys calling your GPU with minimal moving parts. You run inference on your VM (cloud or bare metal). Restormel provides the control layer (routes, policies, aliases)—not the GPU bill.

Prereqs — A machine with an NVIDIA driver + CUDA stack appropriate for your server image; outbound HTTPS from your app to the VM (or VPC peering / private link per your design).

1. Run an OpenAI-compatible server

Pick one stack (examples only—pin versions in production). Expose /v1/chat/completions (or your vendor’s documented OpenAI-compatible path). NVIDIA NGC and similar catalogs ship GPU-optimized images; bind ports only on private interfaces unless you intend public access.

# Example shape only — replace with your image and model paths.
docker run --gpus all -p 127.0.0.1:8000:8000 \
  -e ... \
  your-registry/your-openai-compatible-server:tag

2. Smoke-test from your laptop or bastion

curl -sS "\${BASE}/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $YOUR_SERVER_KEY" \
  -d '{"model":"your-model-id","messages":[{"role":"user","content":"ping"}],"max_tokens":8}'

Do not paste live keys into tickets or CI logs.

3. Register the endpoint in Keys

  1. In the dashboard, add a custom / private provider or gateway configuration pointing at https://your-vm-or-lb/v1 (or HTTP inside VPC).
  2. Create a logical model alias in the project model index so your app uses a stable ID (e.g. private/llama-l4).
  3. Add a policy that allows that alias only where appropriate.

4. Validate with Restormel Testing

Add a short smoke goal that hits the resolved route (latency, HTTP 200, JSON shape). Use your GitHub Actions minutes or self-hosted runners so runner COGS stay on your side. See GPU route smoke (Testing).

5. Operations

  • Rotation — Rotate server API keys in your secret store; update Keys integration references.
  • Upgrades — When you bump container tags (e.g. new NGC build), re-run smoke ACs and capture a Release pack for audit (CLI: testing release-pack).