llms.py - Lightweight OpenAI compatible CLI and server gateway for multiple LLMs

Introducing llms.py 🚀​
We're excited to announce the release of llms.py - a super lightweight CLI tool and OpenAI-compatible server that acts as a configurable gateway over multiple configurable Large Language Model (LLM) providers.
As part of our work in developing a new OSS AI Generation platform we were in need of a lightweight LLM gateway for usage within ComfyUI. Unfortunately the popular option for Python litellm is anything but - requiring 60 deps! where its VC funded development sees its scope creep into Enterprise features.
This is a deal breaker within an open Python plugin ecosystem like ComfyUI where every dependency has a chance to break
a Python environment. It's also unnecessary for a simple CLI tool and server gateway where the hard work
is already done by the LLM providers in their OpenAI-compatible APIs - a simplicity capitalized on by llms.py
.
🎯 OpenRouter but Local​
llms.py is designed as a unified gateway that seamlessly connects you to multiple LLM providers
through a single, consistent interface. Whether using cloud APIs or local models, llms
provides
intelligent routing and automatic failover to ensure your AI workflows connect to your chosen providers in your
preferred priority - whether optimizing for cost, performance or availability.
⚡ Ultra-Lightweight Architecture​
- Single File: Just one llms.py file (easily customizable)
- Single Dependency: Single
aiohttp
dependency - Zero Dependencies for ComfyUI: Ideal for use in ComfyUI Custom Nodes
- No Setup: Just download and use, configure preferred LLMs in llms.json
📦 Installation Options​
Option 1: PyPI Package​
pip install llms-py
Option 2: Direct Download​
For standalone use, download llms.py and make it executable:
curl -O https://raw.githubusercontent.com/ServiceStack/llms/main/llms.py
chmod +x llms.py
mv llms.py ~/.local/bin/llms
Then install its only dependency:
pip install aiohttp
Ideal for usage with ComfyUI Custom Nodes​
Simply drop llms.py into your ComfyUI custom nodes directory - no additional dependencies required!
🔧 Quick Start​
# Initialize configuration (saved in ~/.llms/llms.json)
llms --init
# Enable providers
llms --enable openrouter_free google_free groq openai anthropic grok
# List available providers and models
llms ls
llms ls openrouter_free groq
# Start chatting
llms "Explain quantum computing in simple terms"
# With preferred Model
llms -m grok-4-fast "jq command to sort openai models by created"
# With system prompt
llms -s "You are a quantum computing expert" "Explain quantum computing"
# Use with images
llms --image photo.jpg "What's in this image?"
# Use with audio
llms --audio talk.mp3 "Transcribe this audio file"
# Use with attached file
llms --file report.pdf "Summarize this PDF"
# With custom request template
llms --chat request.json "Explain quantum computing in simple terms"
llms --chat image-request.json --image photo.jpg "What's in this image?"
llms --chat audio-request.json --audio talk.mp3 "Transcribe this audio file"
llms --chat file-request.json --file report.pdf "Summarize this PDF"
# Set default model
llms --default grok-4-fast
# Run an Open AI Chat compatible server
llms --serve 8000
Example client usage:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'
🌐 Configurable Multi-Provider Gateway​
Acts as an intelligent gateway that can route requests for 160+ models across:
Cloud Providers with Free tiers​
- OpenRouter
- Groq
- Codestral
Premium Cloud Providers​
- OpenAI
- Anthropic
- Grok
- Qwen
- Mistral
Local Providers​
- Ollama
- Restrict access to custom models
- Or auto-discovery of installed models
Custom Providers​
Use JSON config to add any OpenAI-compatible API endpoints and models
🔄 Intelligent Request Routing​
- Automatic Failover: If one provider fails, automatically retry with the next available provider
- Cost Optimization: Define free/cheap/local providers first to minimize costs
- Model Mapping: Use unified model names that map to different provider-specific names.
🚀 Key Features​
Multi-Modal Support​
- Text Generation: Chat completions with any supported model
- Vision Models: Process images through vision-capable models (GPT-4V, Gemini Vision, etc.)
- Audio Processing: Handle audio inputs through audio-capable models
Flexible Deployment Options​
- CLI Tool: Interactive command-line interface for quick queries
- HTTP Server: OpenAI-compatible server at
http://localhost:{PORT}/v1/chat/completions
- Python Module: Import and use programmatically in your applications
- ComfyUI Node: Embed directly in ComfyUI workflows
Simple and Customizable​
- Environment Variables: Secure API key management
- Provider Management: Easy enable/disable of providers
- Custom Models: Define your own model aliases and mappings
- Unified Configuration: Single llms.json to configure all providers and models
🎯 Use Cases​
For Developers​
- API Gateway: Centralize all LLM provider access through one endpoint
- Cost Management: Automatically route to cheapest available providers
- Reliability: Built-in failover ensures high availability
- Testing: Easily switch between models and providers for comparison
For ComfyUI Users​
- Hybrid Workflows: Combine local Ollama models with cloud APIs
- Zero Setup: No dependency management headaches
- Provider Flexibility: Switch providers without changing your workflow
- Cost Control: Use free tiers and local models when possible
For Enterprises​
- Vendor Independence: Avoid lock-in to any single LLM provider
- Scalability: Distribute load across multiple providers
- Compliance: Keep sensitive data local while using cloud for general tasks
- Budget Control: Intelligent routing to optimize costs
🌟 Why llms.py?​
- Simplicity: One file, one dependency, infinite possibilities
- Flexibility: Works with any OpenAI-compatible client or framework
- Reliability: Automatic failover ensures your workflows never break
- Economy: Intelligent routing minimizes API costs
- Privacy: Mix local and cloud models based on your data sensitivity
- Future-Proof: Easily add new providers as they emerge
llms.py transforms the complexity of managing multiple LLM providers into a simple, unified experience. Whether you're researching capabilities of new models, building the next breakthrough AI application or just want reliable access to the best models available, llms.py has you covered.
Get started today and avoid expensive cloud lock-ins with the freedom of provider-agnostic AI development! 🎉
Links: