Skip to main content
The OpenAI-Compatible SDK allows you to connect Langdock to any API that follows the OpenAI API specification. This includes popular inference servers like vLLM, LiteLLM, Ollama, and many other self-hosted or custom LLM deployments.

What is OpenAI-Compatible?

Many LLM inference solutions implement the OpenAI API specification as a standard interface. This means they accept requests and return responses in the same format as OpenAI’s API, making them interchangeable from an integration perspective. Common OpenAI-compatible solutions include:
  • vLLM - High-throughput inference server for large language models
  • LiteLLM - Proxy server that provides a unified interface to 100+ LLM providers
  • Ollama - Run large language models locally
  • Text Generation Inference (TGI) - Hugging Face’s inference server
  • LocalAI - Self-hosted, OpenAI-compatible API
  • Custom deployments - Any service implementing the OpenAI chat completions API

Prerequisites

Before setting up an OpenAI-compatible model, you need:
  1. A running OpenAI-compatible inference endpoint accessible over HTTPS
  2. The base URL of your endpoint
  3. The model ID/name as configured in your inference server
  4. An API key (if your endpoint requires authentication)
  5. Admin access to your Langdock workspace

Setup Steps

  1. Go to the model settings and click on Add Model
  2. Configure the Display Settings:
    • Provider: Select the organization that built the model (e.g., Meta for Llama, Mistral for Mistral models)
    • Model name: The name users will see in the model selector
    • Hosting provider: Your hosting solution (e.g., “Self-hosted”, “vLLM”, “Internal”)
    • Region: Select based on where your endpoint is hosted
    • Image analysis: Enable only if your model supports vision capabilities
  3. Configure the Model Configuration:
    • SDK: Select OpenAI Compatible
    • Base URL: Your endpoint URL (e.g., https://your-server.com/v1). This field is required.
    • Model ID: The exact model identifier as configured in your inference server
    • API key: Your authentication key (leave empty if not required)
    • Context Size: The context window size of your model in tokens
  4. Click Save and test the model by sending a prompt before making it visible to all users
Your endpoint must be publicly accessible over HTTPS. Langdock blocks requests to private IPs (e.g., 10.x.x.x, 192.168.x.x), localhost, and other internal hostnames for security reasons. If you need to connect to an internal endpoint, contact support@langdock.com.

Example Configurations

vLLM

SettingValue
SDKOpenAI Compatible
Base URLhttps://your-vllm-server.com/v1
Model IDThe model name you specified when starting vLLM (e.g., meta-llama/Llama-3.1-70B-Instruct)
API keyYour configured API key or leave empty

LiteLLM Proxy

SettingValue
SDKOpenAI Compatible
Base URLhttps://your-litellm-proxy.com
Model IDThe model alias configured in your LiteLLM config
API keyYour LiteLLM proxy API key

Ollama (via public endpoint)

SettingValue
SDKOpenAI Compatible
Base URLhttps://your-ollama-server.com/v1 (must be publicly accessible over HTTPS)
Model IDThe model name as shown in ollama list (e.g., llama3.1, mistral)
API keyLeave empty (Ollama typically does not require authentication)
For Azure OpenAI, use the dedicated Azure SDK instead of OpenAI Compatible — it provides better support including automatic API version management and deployment-based URL routing.

Common Use Cases

Self-Hosted LLMs for Data Privacy

Organizations with strict data residency requirements can run models on their own infrastructure. All prompts and responses stay within your network.

Cost Optimization

Running open-source models on your own hardware can significantly reduce costs for high-volume use cases compared to commercial API pricing.

Custom Fine-Tuned Models

Connect models you have fine-tuned for specific tasks or domains. Deploy them with vLLM or similar servers and integrate directly into Langdock.

Multi-Provider Abstraction

Use LiteLLM as a proxy to route requests to different providers while maintaining a consistent interface in Langdock.

Troubleshooting

Connection refused or timeout:
  • Verify your endpoint URL is accessible from external servers over HTTPS
  • Check that your firewall allows incoming connections
  • Ensure your inference server is running and healthy
  • The endpoint must be publicly accessible — localhost and private IPs are blocked
Authentication errors:
  • Verify your API key is correct
  • Check if your endpoint requires a specific authentication header format
  • Some servers expect the key in a Bearer token format
Model not found:
  • Ensure the Model ID matches exactly what your inference server expects
  • Check case sensitivity in the model name
  • Verify the model is loaded and available on your server
Responses are cut off:
  • Check the max output tokens setting in Langdock
  • Verify your inference server’s generation length limits
Slow responses:
  • Check your server’s available GPU memory and compute resources
  • Consider using quantized model versions for faster inference
  • Monitor your server’s queue length and scaling configuration
Incompatible API format:
  • Not all “OpenAI-compatible” servers implement the full API specification
  • Verify your server supports the /v1/chat/completions endpoint
  • Check if your server requires specific API version headers
If you run into any issues, contact support@langdock.com.