Self Hosted AI Server for LLMs, Ollama, Comfy UI & FFmpeg

Self Hosted AI Server for LLMs, Ollama, Comfy UI & FFmpeg Background
8 min read

AI Server now ready to serve!

We're excited to announce the first release of AI Server - a Free OSS self-hosted Docker private gateway to manage API access to multiple LLM APIs, Ollama endpoints, Media APIs, Comfy UI and FFmpeg Agents.

Centralized Management

Designed as a one-stop solution to manage an organization's AI integrations for all their System Apps, by utilizing developer friendly HTTP JSON APIs that supports any programming language or framework.

Distribute load across multiple Ollama, Open AI Gateway and Comfy UI Agents

It works as a private gateway to process LLM, AI and image transformations requests that any of our Apps need where it dynamically load balances requests across our local GPU Servers, Cloud GPU instances and API Gateways running multiple instances of Ollama, Open AI Chat, LLM Gateway, Comfy UI, Whisper and ffmpeg providers.

In addition to maintaining a history of AI Requests, it also provides file storage for its CDN-hostable AI generated assets and on-the-fly, cacheable image transformations.

Native Typed Integrations

Uses Add ServiceStack Reference to enable simple, native typed integrations for most popular Web, Mobile and Desktop languages including: C#, TypeScript, JavaScript, Python, Java, Kotlin, Dart, PHP, Swift, F# and VB.NET.

Each AI Feature supports multiple call styles for optimal integration of different usages:

  • Synchronous API · Simplest API ideal for small workloads where the Response is returned in the same Request
  • Queued API · Returns a reference to the queued job executing the AI Request which can be used to poll for the API Response
  • Reply to Web Callback · Ideal for reliable App integrations where responses are posted back to a custom URL Endpoint

Live Monitoring and Analytics

Monitor performance and statistics of all your App's AI Usage, real-time logging of executing APIs with auto archival of completed AI Requests into monthly rolling SQLite databases.

Protected Access with API Keys

AI Server utilizes Simple Auth with API Keys letting Admins create and distribute API Keys to only allow authorized clients to access their AI Server's APIs, which can be optionally further restricted to only allow access to specific APIs.

Install

AI Server can be installed on macOS and Linux with Docker by running install.sh:

  1. Clone the AI Server repository from GitHub:

git clone https://github.com/ServiceStack/ai-server

  1. Run the Installer

cd ai-server && cat install.sh | bash

The installer will detect common environment variables for the supported AI Providers like OpenAI, Google, Anthropic, and others, and prompt ask you if you want to add them to your AI Server configuration.

Optional - Install ComfyUI Agent

If your server also has a GPU you can ask the installer to also install the ComfyUI Agent:

The ComfyUI Agent is a separate Docker agent for running ComfyUI, Whisper and FFmpeg on servers with GPUs to handle AI Server's Image and Video transformations and Media Requests, including:

Comfy UI Agent Installer

To install the ComfyUI Agent on a separate server (with a GPU), you can clone and run the ComfyUI Agent installer on that server instead:

  1. Clone the Comfy Agent

git clone https://github.com/ServiceStack/agent-comfy.git

  1. Run the Installer

cd agent-comfy && cat install.sh | bash

Running in Production

We've been developing and running AI Server for several months now, processing millions of LLM and Comfy UI Requests to generate Open AI Chat Answers and Generated Images used to populate the pvq.app and blazordiffusion.com websites.

Our production instance with more info about AI Server is available at:

API Explorer

Whilst our production instance is protected by API Keys, you can still use it to explore available APIs in its API Explorer:

Documentation

The documentation for AI Server is being maintained at:

Built-in UIs

Built-in UIs allow users with API Keys access to custom UIs for different AI features

Admin UIs

Use Admin UI to manage API Keys that can access AI Server APIs and Features

Features

The current release of AI Server supports a number of different modalities, including:

Large Language Models

  • Open AI Chat
    • Support for Ollama endpoints
    • Support for Open Router, Anthropic, Open AI, Mistral AI, Google and Groq API Gateways

Comfy UI Agent / Replicate / DALL-E 3

Comfy UI Agent

FFmpeg

  • Image Transformations

    • Crop Image - Crop an image to a specific size
    • Convert Image - Convert an image to a different format
    • Scale Image - Scale an image to a different resolution
    • Watermark Image - Add a watermark to an image
  • Video Transformations

    • Crop Video - Crop a video to a specific size
    • Convert Video - Convert a video to a different format
    • Scale Video - Scale a video to a different resolution
    • Watermark Video - Add a watermark to a video
    • Trim Video - Trim a video to a specific length

Managed File Storage

  • Blob Storage - isolated and restricted by API Key

AI Server API Examples

To simplify integrations with AI Server each API Request can be called with 3 different call styles to better support different use-cases and integration patterns.

Synchronous Open AI Chat Example

The Synchronous API is the simplest API ideal for small workloads where the Response is returned in the same Request:

var client = new JsonApiClient(baseUrl);
client.BearerToken = apiKey;

var response = client.Post(new OpenAiChatCompletion {
    Model = "mixtral:8x22b",
    Messages = [
        new() {
            Role = "user",
            Content = "What's the capital of France?"
        }
    ],
    MaxTokens = 50
});

var answer = response.Choices[0].Message.Content;

Synchronous Media Generation Request Example

Other AI Requests can be called synchronously in the same way where its API is named after the modality it implements, e.g. you'd instead call TextToImage to generate an Image from a Text description:

var response = client.Post(new TextToImage
    PositivePrompt = "A serene landscape with mountains and a lake",
    Model = "flux-schnell",
    Width = 1024,
    Height = 1024,
    BatchSize = 1
});

File.WriteAllBytes(saveToPath, response.Results[0].Url.GetBytesFromUrl());

Queued Open AI Chat Example

The Queued API immediately Returns a reference to the queued job executing the AI Request:

var response = client.Post(new QueueOpenAiChatCompletion
{
    Request = new()
    {
        Model = "gpt-4-turbo",
        Messages = [
            new() { Role = "system", Content = "You are a helpful AI assistant." },
            new() { Role = "user", Content = "How do LLMs work?" }
        ],
        MaxTokens = 50
    }
});

Which can be used to poll for the API Response of any Job by calling GetOpenAiChatStatusResponse and checking when its state has finished running to get the completed OpenAiChatResponse:

GetOpenAiChatStatusResponse status = new();
while (status.JobState is BackgroundJobState.Started or BackgroundJobState.Queued)
{
    status = await client.GetAsync(new GetOpenAiChatStatus { RefId = response.RefId });
    await Task.Delay(1000);
}

var answer = status.Result.Choices[0].Message.Content;

Queued Media Artifact Generation Request Example

Most other AI Server Requests are Artifact generation requests which would instead call GetArtifactGenerationStatus to get the artifacts response of a queued job, e.g:

var response = client.Post(new QueueTextToImage {
    PositivePrompt = "A serene landscape with mountains and a lake",
    Model = "flux-schnell",
    Width = 1024,
    Height = 1024,
    BatchSize = 1
});

// Poll for Job Completion Status
GetArtifactGenerationStatusResponse status = new();
while (status.JobState is BackgroundJobState.Queued or BackgroundJobState.Started)
{
    status = client.Get(new GetArtifactGenerationStatus { JobId = response.JobId });
    Thread.Sleep(1000);
}

File.WriteAllBytes(saveToPath, status.Results[0].Url.GetBytesFromUrl());

Queued Media Text Generation Request Example

Whilst the Media API Requests that generates text like SpeechToText or ImageToText would instead call GetTextGenerationStatus to get the text response of a queued job, e.g:

using var fsAudio = File.OpenRead("files/test_audio.wav");
var response = client.PostFileWithRequest(new QueueSpeechToText(),
    new UploadFile("test_audio.wav", fsAudio, "audio"));

// Poll for Job Completion Status
GetTextGenerationStatusResponse status = new();
while (status.JobState is BackgroundJobState.Started or BackgroundJobState.Queued)
{
    status = client.Get(new GetTextGenerationStatus { RefId = response.RefId });
    Thread.Sleep(1000);
}

var answer = status.Results[0].Text;

Open AI Chat with Callback Example

The Queued API also accepts a Reply to Web Callback for a more reliable push-based App integration where responses are posted back to a custom URL Endpoint:

var correlationId = Guid.NewGuid().ToString("N");
var response = client.Post(new QueueOpenAiChatCompletion
{
    //...
    ReplyTo = $"https://example.org/api/OpenAiChatResponseCallback?CorrelationId=${correlationId}"
});

Your callback can add any additional metadata on the callback to assist your App in correlating the response with the initiating request which just needs to contain the properties of the OpenAiChatResponse you're interested in along with any metadata added to the callback URL, e.g:

public class OpenAiChatResponseCallback : IPost, OpenAiChatResponse, IReturnVoid
{
    public Guid CorrelationId { get; set; }
}

public object Post(OpenAiChatResponseCallback request)
{
    // Handle OpenAiChatResponse callabck
}

Unless your callback API is restricted to only accept requests from your AI Server, you should include a unique Id like a Guid in the callback URL that can be validated against an initiating request to ensure the callback can't be spoofed.

Feedback

Feel free to reach us at ai-server/discussions with any AI Server questions.