OpenAI-Compatible Endpoints

The OpenAI-Compatible SDK allows you to connect Langdock to any API that follows the OpenAI API specification. This includes popular inference servers like vLLM, LiteLLM, Ollama, and many other self-hosted or custom LLM deployments.

What is OpenAI-Compatible?

Many LLM inference solutions implement the OpenAI API specification as a standard interface. This means they accept requests and return responses in the same format as OpenAI’s API, making them interchangeable from an integration perspective. Common OpenAI-compatible solutions include:

vLLM - High-throughput inference server for large language models
LiteLLM - Proxy server that provides a unified interface to 100+ LLM providers
Ollama - Run large language models locally
Text Generation Inference (TGI) - Hugging Face’s inference server
LocalAI - Self-hosted, OpenAI-compatible API
Custom deployments - Any service implementing the OpenAI chat completions API

Prerequisites

Before setting up an OpenAI-compatible model, you need:

A running OpenAI-compatible inference endpoint accessible over HTTPS
The base URL of your endpoint
The model ID/name as configured in your inference server
An API key (if your endpoint requires authentication)
Admin access to your Langdock workspace

Setup Steps

Go to the model settings and click on Add Model
Configure the Display Settings:
- Provider: Select the organization that built the model (e.g., Meta for Llama, Mistral for Mistral models)
- Model name: The name users will see in the model selector
- Hosting provider: Your hosting solution (e.g., “Self-hosted”, “vLLM”, “Internal”)
- Region: Select based on where your endpoint is hosted
- Image analysis: Enable only if your model supports vision capabilities
Configure the Model Configuration:
- SDK: Select OpenAI Compatible
- Base URL: Your endpoint URL (e.g., https://your-server.com/v1). This field is required.
- Model ID: The exact model identifier as configured in your inference server
- API key: Your authentication key (leave empty if not required)
- Context Size: The context window size of your model in tokens
Click Save and test the model by sending a prompt before making it visible to all users

Your endpoint must be publicly accessible over HTTPS. Langdock blocks requests to private IPs (e.g., 10.x.x.x, 192.168.x.x), localhost, and other internal hostnames for security reasons. If you need to connect to an internal endpoint, contact support@langdock.com.

Example Configurations

vLLM

Setting	Value
SDK	OpenAI Compatible
Base URL	`https://your-vllm-server.com/v1`
Model ID	The model name you specified when starting vLLM (e.g., `meta-llama/Llama-3.1-70B-Instruct`)
API key	Your configured API key or leave empty

LiteLLM Proxy

Setting	Value
SDK	OpenAI Compatible
Base URL	`https://your-litellm-proxy.com`
Model ID	The model alias configured in your LiteLLM config
API key	Your LiteLLM proxy API key

Ollama (via public endpoint)

Setting	Value
SDK	OpenAI Compatible
Base URL	`https://your-ollama-server.com/v1` (must be publicly accessible over HTTPS)
Model ID	The model name as shown in `ollama list` (e.g., `llama3.1`, `mistral`)
API key	Leave empty (Ollama typically does not require authentication)

For Azure OpenAI, use the dedicated Azure SDK instead of OpenAI Compatible — it provides better support including automatic API version management and deployment-based URL routing.

Common Use Cases

Self-Hosted LLMs for Data Privacy

Organizations with strict data residency requirements can run models on their own infrastructure. All prompts and responses stay within your network.

Cost Optimization

Running open-source models on your own hardware can significantly reduce costs for high-volume use cases compared to commercial API pricing.

Custom Fine-Tuned Models

Connect models you have fine-tuned for specific tasks or domains. Deploy them with vLLM or similar servers and integrate directly into Langdock.

Multi-Provider Abstraction

Use LiteLLM as a proxy to route requests to different providers while maintaining a consistent interface in Langdock.

Troubleshooting

Connection refused or timeout:

Verify your endpoint URL is accessible from external servers over HTTPS
Check that your firewall allows incoming connections
Ensure your inference server is running and healthy
The endpoint must be publicly accessible — localhost and private IPs are blocked

Authentication errors:

Verify your API key is correct
Check if your endpoint requires a specific authentication header format
Some servers expect the key in a Bearer token format

Model not found:

Ensure the Model ID matches exactly what your inference server expects
Check case sensitivity in the model name
Verify the model is loaded and available on your server

Responses are cut off:

Check the max output tokens setting in Langdock
Verify your inference server’s generation length limits

Slow responses:

Check your server’s available GPU memory and compute resources
Consider using quantized model versions for faster inference
Monitor your server’s queue length and scaling configuration

Incompatible API format:

Not all “OpenAI-compatible” servers implement the full API specification
Verify your server supports the /v1/chat/completions endpoint
Check if your server requires specific API version headers

If you run into any issues, contact support@langdock.com.

Get Started

Chat

Agents

Workflows

Integrations

Chatbots

Models

Settings & Configuration

Security

Admin Settings

Troubleshooting

OpenAI-Compatible Endpoints

What is OpenAI-Compatible?

Prerequisites

Setup Steps

Example Configurations

vLLM

LiteLLM Proxy

Ollama (via public endpoint)

Common Use Cases

Self-Hosted LLMs for Data Privacy

Cost Optimization

Custom Fine-Tuned Models

Multi-Provider Abstraction

Troubleshooting

Get Started

Chat

Agents

Workflows

Integrations

Chatbots

Models

Settings & Configuration

Security

Admin Settings

Troubleshooting

​What is OpenAI-Compatible?

​Prerequisites

​Setup Steps

​Example Configurations

​vLLM

​LiteLLM Proxy

​Ollama (via public endpoint)

​Common Use Cases

​Self-Hosted LLMs for Data Privacy

​Cost Optimization

​Custom Fine-Tuned Models

​Multi-Provider Abstraction

​Troubleshooting

What is OpenAI-Compatible?

Prerequisites

Setup Steps

Example Configurations

vLLM

LiteLLM Proxy

Ollama (via public endpoint)

Common Use Cases

Self-Hosted LLMs for Data Privacy

Cost Optimization

Custom Fine-Tuned Models

Multi-Provider Abstraction

Troubleshooting