# Agents URL: https://developers.cloudflare.com/workers-ai/agents/ import { LinkButton } from "~/components"
Build AI assistants that can perform complex tasks on behalf of your users using Cloudflare Workers AI and Agents.
documents
: results
:
Where `json_schema` must be a valid [JSON Schema](https://json-schema.org/) declaration.
## JSON Mode example
When using JSON Format, pass the schema as in the example below as part of the request you send to the LLM.
The LLM will follow the schema, and return a response such as below:
As you can see, the model is complying with the JSON schema definition in the request and responding with a validated JSON object.
## Supported Models
This is the list of models that now support JSON Mode:
- [@cf/meta/llama-3.1-8b-instruct-fast](/workers-ai/models/llama-3.1-8b-instruct-fast/)
- [@cf/meta/llama-3.1-70b-instruct](/workers-ai/models/llama-3.1-70b-instruct/)
- [@cf/meta/llama-3.3-70b-instruct-fp8-fast](/workers-ai/models/llama-3.3-70b-instruct-fp8-fast/)
- [@cf/meta/llama-3-8b-instruct](/workers-ai/models/llama-3-8b-instruct/)
- [@cf/meta/llama-3.1-8b-instruct](/workers-ai/models/llama-3.1-8b-instruct/)
- [@cf/meta/llama-3.2-11b-vision-instruct](/workers-ai/models/llama-3.2-11b-vision-instruct/)
- [@hf/nousresearch/hermes-2-pro-mistral-7b](/workers-ai/models/hermes-2-pro-mistral-7b/)
- [@hf/thebloke/deepseek-coder-6.7b-instruct-awq](/workers-ai/models/deepseek-coder-6.7b-instruct-awq/)
- [@cf/deepseek-ai/deepseek-r1-distill-qwen-32b](/workers-ai/models/deepseek-r1-distill-qwen-32b/)
We will continue extending this list to keep up with new, and requested models.
Note that Workers AI can't guarantee that the model responds according to the requested JSON Schema. Depending on the complexity of the task and adequacy of the JSON Schema, the model may not be able to satisfy the request in extreme situations. If that's the case, then an error `JSON Mode couldn't be met` is returned and must be handled.
JSON Mode currently doesn't support streaming.
---
# Prompting
URL: https://developers.cloudflare.com/workers-ai/features/prompting/
import { Code } from "~/components";
export const scopedExampleOne = `{
messages: [
{ role: "system", content: "you are a very funny comedian and you like emojis" },
{ role: "user", content: "tell me a joke about cloudflare" },
],
};`;
export const scopedExampleTwo = `{
messages: [
{ role: "system", content: "you are a professional computer science assistant" },
{ role: "user", content: "what is WASM?" },
{ role: "assistant", content: "WASM (WebAssembly) is a binary instruction format that is designed to be a platform-agnostic" },
{ role: "user", content: "does Python compile to WASM?" },
{ role: "assistant", content: "No, Python does not directly compile to WebAssembly" },
{ role: "user", content: "what about Rust?" },
],
};`;
export const unscopedExampleOne = `{
prompt: "tell me a joke about cloudflare";
}`;
export const unscopedExampleTwo = `{
prompt: "
Here's a better example of a chat session using multiple iterations between the user and the assistant.
Note that different LLMs are trained with different templates for different use cases. While Workers AI tries its best to abstract the specifics of each LLM template from the developer through a unified API, you should always refer to the model documentation for details (we provide links in the table above.) For example, instruct models like Codellama are fine-tuned to respond to a user-provided instruction, while chat models expect fragments of dialogs as input.
### Unscoped Prompts
You can use unscoped prompts to send a single question to the model without worrying about providing any context. Workers AI will automatically convert your `prompt` input to a reasonable default scoped prompt internally so that you get the best possible prediction.
You can also use unscoped prompts to construct the model chat template manually. In this case, you can use the raw parameter. Here's an input example of a [Mistral](https://docs.mistral.ai/models/#chat-template) chat template prompt:
---
# Asynchronous Batch API
URL: https://developers.cloudflare.com/workers-ai/features/batch-api/
import { Render, PackageManagers, WranglerConfig, CURL } from "~/components";
Asynchronous batch processing lets you send a collection (batch) of inference requests in a single call. Instead of expecting immediate responses for every request, the system queues them for processing and returns the results later.
Batch processing is useful for large workloads such as summarization or embeddings when there is no human interaction. Using the batch API will guarantee that your requests are fulfilled eventually, rather than erroring out if Cloudflare does have enough capacity at a given time.
When you send a batch request, the API immediately acknowledges receipt with a status like `queued` and provides a unique `request_id`. This ID is later used to poll for the final responses once the processing is complete.
You can use the Batch API by either creating and deploying a Cloudflare Worker that leverages the [Batch API with the AI binding](/workers-ai/features/batch-api/workers-binding/), using the [REST API](/workers-ai/features/batch-api/rest-api/) directly or by starting from a [template](https://github.com/craigsdennis/batch-please-workers-ai).
:::note[Note]
Ensure that the total payload is under 10 MB.
:::
## Demo application
If you want to get started quickly, click the button below:
[](https://deploy.workers.cloudflare.com/?url=https://github.com/craigsdennis/batch-please-workers-ai)
This will create a repository in your GitHub account and deploy a ready-to-use Worker that demonstrates how to use Cloudflare's Asynchronous Batch API. The template includes preconfigured AI bindings, and examples for sending and retrieving batch requests with and without external references. Once deployed, you can visit the live Worker and start experimenting with the Batch API immediately.
## Supported Models
- [@cf/meta/llama-3.3-70b-instruct-fp8-fast](/workers-ai/models/llama-3.3-70b-instruct-fp8-fast/)
- [@cf/baai/bge-small-en-v1.5](/workers-ai/models/bge-small-en-v1.5/)
- [@cf/baai/bge-base-en-v1.5](/workers-ai/models/bge-base-en-v1.5/)
- [@cf/baai/bge-large-en-v1.5](/workers-ai/models/bge-large-en-v1.5/)
- [@cf/baai/bge-m3](/workers-ai/models/bge-m3/)
- [@cf/meta/m2m100-1.2b](/workers-ai/models/m2m100-1.2b/)
---
# REST API
URL: https://developers.cloudflare.com/workers-ai/features/batch-api/rest-api/
If you prefer to work directly with the REST API instead of a [Cloudflare Worker](/workers-ai/features/batch-api/workers-binding/), below are the steps on how to do it:
## 1. Sending a Batch Request
Make a POST request using the following pattern. You can pass `external_reference` as a unique ID per-prompt that will be returned in the response.
```bash title="Sending a batch request" {11,15,19}
curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/ai/run/@cf/baai/bge-m3?queueRequest=true" \
--header "Authorization: Bearer $API_TOKEN" \
--header 'Content-Type: application/json' \
--json '{
"requests": [
{
"query": "This is a story about Cloudflare",
"contexts": [
{
"text": "This is a story about an orange cloud",
"external_reference": "story1"
},
{
"text": "This is a story about a llama",
"external_reference": "story2"
},
{
"text": "This is a story about a hugging emoji",
"external_reference": "story3"
}
]
}
]
}'
```
```json output {4}
{
"result": {
"status": "queued",
"request_id": "768f15b7-4fd6-4498-906e-ad94ffc7f8d2",
"model": "@cf/baai/bge-m3"
},
"success": true,
"errors": [],
"messages": []
}
```
## 2. Retrieving the Batch Response
After receiving a `request_id` from your initial POST, you can poll for or retrieve the results with another POST request:
```bash title="Retrieving a response"
curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/ai/run/@cf/baai/bge-m3?queueRequest=true" \
--header "Authorization: Bearer $API_TOKEN" \
--header 'Content-Type: application/json' \
--json '{
"request_id": "Get started by creating your first note
Configure post-processing of recording transcriptions with AI models.
Settings changes are auto-saved locally.