OpenAI-compatible API server for diffusion code models. Code doesn't get written left to right — it gets spray-painted all at once.
"I didn't write it. Nobody saw me write it. You can't prove anything."— Bart Simpson, on how diffusion models generate code
Stable-DiffCoder is ByteDance's mask-diffusion code LLM that spray-paints code through iterative denoising instead of boring left-to-right token generation. El Barto wraps it in a standard /v1/chat/completions endpoint.
Stable-DiffCoder-8B-Instruct tops the benchmarks for 8B code models — beating Qwen2.5-Coder, CodeLlama, and every other diffusion LLM. But it uses a non-standard diffusion inference pipeline that no existing serving framework supports (not vLLM, not Ollama, not TensorRT-LLM).
El Barto wraps the custom diffusion generation in a standard /v1/chat/completions endpoint so you can use it with any OpenAI-compatible client.
Traditional LLMs generate code left-to-right, one token at a time. Stable-DiffCoder works differently:
[MASK] tokensThis "any-order" generation means the model can consider the full structure simultaneously, making it naturally better at maintaining syntax, matching brackets, and reasoning about code structure.
Option A: Native Install (DGX Spark)
git clone https://github.com/NathanMaine/el-barto-serve.git
cd el-barto-serve
# Automated setup (creates venv, installs CUDA 13.0 PyTorch, deps)
./setup-spark.sh
# Activate and run
source .venv/bin/activate
python server.pyOption B: Docker (NGC Container)
docker build -t el-barto-serve .
docker run -it --gpus all \
-p 8000:8000 \
-e ELBARTO_MODEL_PATH=/models/Stable-DiffCoder-8B-Instruct \
-v /path/to/your/model:/models/Stable-DiffCoder-8B-Instruct \
el-barto-serveOption C: Other CUDA GPUs
git clone https://github.com/NathanMaine/el-barto-serve.git
cd el-barto-serve
python -m venv .venv && source .venv/bin/activate
pip install torch # Standard PyTorch for your GPU
pip install -r requirements.txt
python server.pyTest with curl
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "stable-diffcoder",
"messages": [{"role": "user", "content": "Write a binary search in Python"}],
"temperature": 0.0
}'Connect from VS Code (Continue.dev)
1. Install the Continue extension
2. Open Continue settings (~/.continue/config.json)
3. Add El Barto as a model:
{
"models": [
{
"title": "El Barto (Stable-DiffCoder)",
"provider": "openai",
"model": "stable-diffcoder",
"apiBase": "http://YOUR_SPARK_IP:8000/v1",
"apiKey": "not-needed"
}
]
}All settings via environment variables (or .env file — copy from .env.example):
| Variable | Default | Description |
|---|---|---|
ELBARTO_MODEL_PATH |
ByteDance-Seed/Stable-DiffCoder-8B-Instruct | Local path or HuggingFace model ID |
ELBARTO_HOST |
0.0.0.0 | Bind address |
ELBARTO_PORT |
8000 | Server port |
ELBARTO_STEPS |
256 | Diffusion denoising steps (more = higher quality, slower) |
ELBARTO_GEN_LENGTH |
512 | Max output tokens |
ELBARTO_BLOCK_LENGTH |
4 | Block diffusion granularity |
ELBARTO_THRESHOLD |
None | Early stopping confidence (0.0–1.0); lower = faster |
ELBARTO_REMASKING |
low_confidence | Remasking strategy (low_confidence or random) |
# Maximum quality (slow — 512 steps, no early stopping)
ELBARTO_STEPS=512 ELBARTO_THRESHOLD= python server.py
# Balanced (default)
ELBARTO_STEPS=256 python server.py
# Fast mode (fewer steps + early stopping)
ELBARTO_STEPS=128 ELBARTO_THRESHOLD=0.5 python server.py
# Fastest (aggressive early stopping — "eat my shorts" mode)
ELBARTO_STEPS=64 ELBARTO_THRESHOLD=0.3 python server.pyStandard OpenAI chat completions format. Supports both streaming and non-streaming.
Extra diffusion-specific fields you can pass via the request body:
{
"steps": 256,
"gen_length": 512,
"block_length": 4,
"threshold": null,
"remasking": "low_confidence"
}List available models.
Health check with model status and device info.
| Model | HumanEval | MBPP | MHPP | BigCodeBench |
|---|---|---|---|---|
| Qwen2.5-Coder-7B-Instruct | 88.4 | 83.5 | 26.7 | 48.8 |
| Seed-Coder-8B-Instruct | 84.8 | 85.2 | 36.2 | 53.3 |
| Stable-DiffCoder-8B-Instruct | 86.6 | 85.7 | 42.4 | 54.8 |
Things we learned so the Spark doesn't have a cow: