Anthropic Messages
Creates a model response given a structured list of input messages using the Anthropic API.
Creates a model response for the given chat conversation. This endpoint follows the Anthropic API specification and the requests are sent to the AWS Bedrock Anthropic endpoint.
To use the API you need an API key. To request access, please contact us at support@langdock.com
All parameters from the Anthropic “Create a message” endpoint are supported according to the Anthropic specifications, with the following exception:
model
: The supported models depend on the region, currently the following models are supported in the EU:claude-3-5-sonnet-20240620
,claude-3-sonnet-20240229
,claude-3-haiku-20240307
and the following models are supported in the US:claude-3-5-sonnet-20240620
,claude-3-sonnet-20240229
,claude-3-haiku-20240307
,claude-3-opus-20240229
.
Rate limits
The rate limit for the Messages endpoint is 500 RPM (requests per minute) and 60.000 TPM (tokens per minute). Rate limits are defined at the workspace level - and not at an API key level. Each model has its own rate limit. If you exceed your rate limit, you will receive a 429 Too Many Requests
response.
Please note that the rate limits are subject to change, refer to this documentation for the most up-to-date information. In case you need a higher rate limit, please contact us at support@langdock.com.
Using Anthropic-compatible libraries
As the request and response format is the same as the Anthropic API, you can use popular libraries like the Anthropic Python library or the Vercel AI SDK to use the Langdock API.
Example using the Anthropic Python library
Example using the Vercel AI SDK in Node.js
Path Parameters
The region of the API to use.
eu
, us
Body
The model that will complete your prompt. See models for additional details and options.
claude-3-5-sonnet-20240620
, claude-3-opus-20240229
, claude-3-sonnet-20240229
, claude-3-haiku-20240307
Input messages.
Anthropic's models are trained to operate on alternating user
and assistant
conversational turns. When creating a new Message
, you specify the prior conversational turns with the messages
parameter, and the model then generates the next Message
in the conversation.
Each input message must be an object with a role
and content
. You can specify a single user
-role message, or you can include multiple user
and assistant
messages. The first message must always use the user
role.
If the final message uses the assistant
role, the response content will continue immediately from the content in that message. This can be used to constrain part of the model's response.
Example with a single user
message:
[{"role": "user", "content": "Hello, Claude"}]
Example with multiple conversational turns:
[
{"role": "user", "content": "Hello there."},
{"role": "assistant", "content": "Hi, I'm Claude. How can I help you?"},
{"role": "user", "content": "Can you explain LLMs in plain English?"},
]
Example with a partially-filled response from Claude:
[
{"role": "user", "content": "What's the Greek name for Sun? (A) Sol (B) Helios (C) Sun"},
{"role": "assistant", "content": "The best answer is ("},
]
Each input message content
may be either a single string
or an array of content blocks, where each block has a specific type
. Using a string
for content
is shorthand for an array of one content block of type "text"
. The following input messages are equivalent:
{"role": "user", "content": "Hello, Claude"}
{"role": "user", "content": [{"type": "text", "text": "Hello, Claude"}]}
Starting with Claude 3 models, you can also send image content blocks:
{"role": "user", "content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": "/9j/4AAQSkZJRg...",
}
},
{"type": "text", "text": "What is in this image?"}
]}
We currently support the base64
source type for images, and the image/jpeg
, image/png
, image/gif
, and image/webp
media types.
See examples for more input examples.
Note that if you want to include a system prompt, you can use the top-level system
parameter — there is no "system"
role for input messages in the Messages API.
The maximum number of tokens to generate before stopping.
Note that Anthropic's models may stop before reaching this maximum. This parameter only specifies the absolute maximum number of tokens to generate.
Different models have different maximum values for this parameter. See models for details.
x > 1
An object describing metadata about the request.
Custom text sequences that will cause the model to stop generating.
Anthropic's models will normally stop when they have naturally completed their turn, which will result in a response stop_reason
of "end_turn"
.
If you want the model to stop generating when it encounters custom strings of text, you can use the stop_sequences
parameter. If the model encounters one of the custom sequences, the response stop_reason
value will be "stop_sequence"
and the response stop_sequence
value will contain the matched stop sequence.
Whether to incrementally stream the response using server-sent events.
See streaming for details.
System prompt.
A system prompt is a way of providing context and instructions to Claude, such as specifying a particular goal or role. See Anthropic's guide to system prompts.
Amount of randomness injected into the response.
Defaults to 1.0
. Ranges from 0.0
to 1.0
. Use temperature
closer to 0.0
for analytical / multiple choice, and closer to 1.0
for creative and generative tasks.
Note that even with temperature
of 0.0
, the results will not be fully deterministic.
0 < x < 1
How the model should use the provided tools. The model can use a specific tool, any available tool, or decide by itself.
Definitions of tools that the model may use.
If you include tools
in your API request, the model may return tool_use
content blocks that represent the model's use of those tools. You can then run those tools using the tool input generated by the model and then optionally return results back to the model using tool_result
content blocks.
Each tool definition includes:
name
: Name of the tool.description
: Optional, but strongly-recommended description of the tool.input_schema
: JSON schema for the toolinput
shape that the model will produce intool_use
output content blocks.
For example, if you defined tools
as:
[
{
"name": "get_stock_price",
"description": "Get the current stock price for a given ticker symbol.",
"input_schema": {
"type": "object",
"properties": {
"ticker": {
"type": "string",
"description": "The stock ticker symbol, e.g. AAPL for Apple Inc."
}
},
"required": ["ticker"]
}
}
]
And then asked the model "What's the S&P 500 at today?", the model might produce tool_use
content blocks in the response like this:
[
{
"type": "tool_use",
"id": "toolu_01D7FLrfh4GYq7yT1ULFeyMV",
"name": "get_stock_price",
"input": { "ticker": "^GSPC" }
}
]
You might then run your get_stock_price
tool with {"ticker": "^GSPC"}
as an input, and return the following back to the model in a subsequent user
message:
[
{
"type": "tool_result",
"tool_use_id": "toolu_01D7FLrfh4GYq7yT1ULFeyMV",
"content": "259.75 USD"
}
]
Tools can be used for workflows that include running client-side tools and functions, or more generally whenever you want the model to produce a particular JSON structure of output.
See Anthropic's guide for more details.
Only sample from the top K options for each subsequent token.
Used to remove "long tail" low probability responses. Learn more technical details here.
Recommended for advanced use cases only. You usually only need to use temperature
.
x > 0
Use nucleus sampling.
In nucleus sampling, we compute the cumulative distribution over all the options for each subsequent token in decreasing probability order and cut it off once it reaches a particular probability specified by top_p
. You should either alter temperature
or top_p
, but not both.
Recommended for advanced use cases only. You usually only need to use temperature
.
0 < x < 1
Response
Unique object identifier.
The format and length of IDs may change over time.
Object type.
For Messages, this is always "message"
.
message
Conversational role of the generated message.
This will always be "assistant"
.
assistant
Content generated by the model.
This is an array of content blocks, each of which has a type
that determines its shape.
Example:
[{"type": "text", "text": "Hi, I'm Claude."}]
If the request input messages
ended with an assistant
turn, then the response content
will continue directly from that last turn. You can use this to constrain the model's output.
For example, if the input messages
were:
[
{"role": "user", "content": "What's the Greek name for Sun? (A) Sol (B) Helios (C) Sun"},
{"role": "assistant", "content": "The best answer is ("}
]
Then the response content
might be:
[{"type": "text", "text": "B)"}]
The model that will complete your prompt. See models for additional details and options.
claude-3-5-sonnet-20240620
, claude-3-opus-20240229
, claude-3-sonnet-20240229
, claude-3-haiku-20240307
The reason that we stopped.
This may be one the following values:
"end_turn"
: the model reached a natural stopping point"max_tokens"
: we exceeded the requestedmax_tokens
or the model's maximum"stop_sequence"
: one of your provided customstop_sequences
was generated"tool_use"
: the model invoked one or more tools
In non-streaming mode this value is always non-null. In streaming mode, it is null in the message_start
event and non-null otherwise.
end_turn
, max_tokens
, stop_sequence
, tool_use
Which custom stop sequence was generated, if any.
This value will be a non-null string if one of your custom stop sequences was generated.
Input and output token counts, representing the underlying cost to our systems.
Under the hood, the API transforms requests into a format suitable for the model. The model's output then goes through a parsing stage before becoming an API response. As a result, the token counts in usage
will not match one-to-one with the exact visible content of an API request or response.
For example, output_tokens
will be non-zero, even for an empty string response from Claude.
Was this page helpful?