discourse/plugins/discourse-ai/spec/fabricators/llm_model_fabricator.rb
Rafael dos Santos Silva 18a0a8daeb
FEATURE: Add AWS Bedrock Converse API provider (#38903)
## Summary

Adds a new `aws_bedrock_converse` inference provider that uses the
official AWS SDK (`aws-sdk-bedrockruntime`) and the Converse API. This
runs alongside the existing `aws_bedrock` provider — fully additive,
zero risk to existing configurations.

### Why a new provider?

The existing `aws_bedrock` provider manually handles SigV4 signing, URL
construction, binary event stream decoding, and maintains a hardcoded
model ID mapping table. It only supports Claude and Nova models.

The new provider delegates all of this to the official AWS SDK, which
means:

- **Model-agnostic** — works with any model available on Bedrock
(Claude, Nova, Kimi, MiniMax, Mistral, Llama, DeepSeek, NVIDIA, Qwen,
GLM, etc.) without any model-specific code
- **Application Inference Profiles** — users can set cross-region
profiles (`us.anthropic.claude-sonnet-4-20250514-v1:0`) or application
inference profile ARNs directly as the model name
- **Bedrock API Key auth** — supports the new AWS Bedrock API keys
(Bearer token auth) in addition to IAM access keys, STS role assumption,
and automatic credential resolution from environment/instance profiles
- **No maintenance burden** — no model ID mapping table to update when
AWS adds new models, no manual SigV4 signing, no binary event stream
decoding
- **Native tools only** — no XML tool fallback; uses the Converse API's
built-in tool support

### Authentication options (priority order)

| Config | Auth method |
|---|---|
| `role_arn` set | STS AssumeRole (SigV4) |
| `access_key_id` set | Static IAM credentials (SigV4) |
| API key set (no access_key_id/role_arn) | Bearer token (Bedrock API
key) |
| Nothing set | SDK auto-resolves (env vars, instance profile, ECS task
role) |

### Features supported

- Streaming and non-streaming completions
- Native tool use with tool_choice (auto/any/specific tool)
- Structured output via Converse API's `output_config` (models that
support it)
- Extended thinking / adaptive thinking with signature preservation for
multi-turn
- Interleaved thinking with tool calls (thinking blocks preserved per
tool_call message)
- Prompt caching via `cache_point` blocks
- Effort parameter (low/medium/high/max)
- `extra_model_fields` provider param for arbitrary
`additionalModelRequestFields` (beta features like `anthropic_beta`, 1M
context, interleaved thinking)

### New files

- `lib/completions/endpoints/aws_bedrock_converse.rb` — endpoint using
`Aws::BedrockRuntime::Client`
- `lib/completions/dialects/converse.rb` — unified Converse API dialect
- `lib/completions/dialects/converse_tools.rb` — tool formatting
- `lib/completions/converse_message_processor.rb` — response processing
for SDK typed objects

## Tested against real Bedrock API

All tests performed using Bedrock API Key auth (Bearer token) against
live endpoints with 9 different models from 8 providers:

| Test | Claude Sonnet 4 | Claude Haiku 4.5 | Kimi K2.5 | MiniMax M2 |
DeepSeek 3.2 | NVIDIA Nemotron 3 120B | Qwen3 Next 80B | GLM 5 | Mistral
Small |
|---|---|---|---|---|---|---|---|---|---|
| Non-streaming text |  |  |
 |  |  |
 |  |  |
 |
| Streaming text |  |  |
 |  |  |
 |  |  |
 |
| Multi-turn conversation |  |  |
 |  |  |
 |  |  |
 |
| Tool use (non-streaming) |  |  |
 |  |  |
 |  |  |
 |
| Tool use (streaming) |  |  |
 |  |  |
 |  |  |  model
unsupported |
| Structured output (non-streaming) | — |  |  model
unsupported |  |  |
 |  |  |  model
unsupported |
| Structured output (streaming) | — |  |  model
unsupported |  |  |
 |  |  |  model
unsupported |
| Bearer token auth |  |  |
 |  |  |
 |  |  |
 |
| Cross-region inference profile |  |
 | — | — | — | — | — | — | — |
| Audit logging + token tracking |  |
 |  |  |
 |  |  |
 |  |

> **Notes:**
> - Claude Sonnet 4 structured output not tested — requires 4.5+ for
this feature and those cross-region profiles were not available in the
test region.
> - Kimi K2.5 and Mistral Small do not support Bedrock's native
structured output.
> - Mistral Small does not support streaming tool use.
> - All  results are model-level limitations, not code issues — the
Converse API correctly surfaces the error.

## Test plan

- [ ] Existing `aws_bedrock` provider tests pass (`bin/rspec
spec/lib/completions/endpoints/aws_bedrock_spec.rb`)
- [ ] New provider tests pass (`bin/rspec
spec/lib/completions/endpoints/aws_bedrock_converse_spec.rb`)
- [ ] Create an LLM model with provider "AWS Bedrock (Converse API)" in
admin UI
- [ ] Verify basic completion works with a Bedrock API key (just region
+ API key, no IAM keys needed)
- [ ] Verify tool use works in AI bot conversations
- [ ] Verify structured output works with a supported model (Claude
Haiku 4.5+)
2026-03-30 12:37:30 -03:00

142 lines
3.8 KiB
Ruby
Vendored

# frozen_string_literal: true
Fabricator(:llm_model) do
display_name "A good model"
name "gpt-4-turbo"
provider "open_ai"
tokenizer "DiscourseAi::Tokenizer::OpenAiTokenizer"
api_key "123"
url "https://api.openai.com/v1/chat/completions"
max_prompt_tokens 131_072
input_cost 10
cached_input_cost 2.5
output_cost 40
end
Fabricator(:anthropic_model, from: :llm_model) do
display_name "Claude 3 Opus"
name "claude-3-opus"
max_prompt_tokens 200_000
url "https://api.anthropic.com/v1/messages"
tokenizer "DiscourseAi::Tokenizer::AnthropicTokenizer"
provider "anthropic"
end
Fabricator(:hf_model, from: :llm_model) do
display_name "Llama 3.1"
name "meta-llama/Meta-Llama-3.1-70B-Instruct"
max_prompt_tokens 64_000
tokenizer "DiscourseAi::Tokenizer::Llama3Tokenizer"
url "https://test.dev/v1/chat/completions"
provider "hugging_face"
end
Fabricator(:open_router_model, from: :llm_model) do
display_name "OpenRouter"
name "openrouter-1.0"
provider "open_router"
tokenizer "DiscourseAi::Tokenizer::OpenAiTokenizer"
max_prompt_tokens 64_000
url "https://openrouter.ai/api/v1/chat/completions"
end
Fabricator(:vllm_model, from: :llm_model) do
display_name "Llama 3.1 vLLM"
name "meta-llama/Meta-Llama-3.1-70B-Instruct"
max_prompt_tokens 64_000
tokenizer "DiscourseAi::Tokenizer::Llama3Tokenizer"
url "https://test.dev/v1/chat/completions"
provider "vllm"
end
Fabricator(:fake_model, from: :llm_model) do
display_name "Fake model"
name "fake"
provider "fake"
tokenizer "DiscourseAi::Tokenizer::OpenAiTokenizer"
max_prompt_tokens 32_000
api_key "fake"
url "https://fake.test/"
end
Fabricator(:gemini_model, from: :llm_model) do
display_name "Gemini"
name "gemini-1.5-pro"
provider "google"
tokenizer "DiscourseAi::Tokenizer::OpenAiTokenizer"
max_prompt_tokens 800_000
url "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-latest"
end
Fabricator(:bedrock_model, from: :anthropic_model) do
url ""
provider "aws_bedrock"
api_key "asd-asd-asd"
name "claude-3-sonnet"
provider_params { { region: "us-east-1", access_key_id: "123456" } }
end
Fabricator(:bedrock_converse_model, from: :anthropic_model) do
url ""
provider "aws_bedrock_converse"
api_key "asd-asd-asd"
name "claude-3-sonnet"
provider_params { { region: "us-east-1", access_key_id: "123456" } }
end
Fabricator(:nova_model, from: :llm_model) do
display_name "Amazon Nova pro"
name "amazon.nova-pro-v1:0"
provider "aws_bedrock"
tokenizer "DiscourseAi::Tokenizer::OpenAiTokenizer"
max_prompt_tokens 300_000
api_key "fake"
url ""
provider_params { { region: "us-east-1", access_key_id: "123456" } }
end
Fabricator(:cohere_model, from: :llm_model) do
display_name "Cohere Command R+"
name "command-r-plus"
provider "cohere"
api_key "ABC"
url "https://api.cohere.ai/v1/chat"
end
Fabricator(:samba_nova_model, from: :llm_model) do
display_name "Samba Nova"
name "samba-nova"
provider "samba_nova"
api_key "ABC"
url "https://api.sambanova.ai/v1/chat/completions"
end
Fabricator(:ollama_model, from: :llm_model) do
display_name "Ollama llama 3.1"
name "llama-3.1"
provider "ollama"
api_key "ABC"
tokenizer "DiscourseAi::Tokenizer::Llama3Tokenizer"
url "http://api.ollama.ai/api/chat"
provider_params { { enable_native_tool: true } }
end
Fabricator(:mistral_model, from: :llm_model) do
display_name "Mistral Large"
name "mistral-large-latest"
provider "mistral"
api_key "ABC"
tokenizer "DiscourseAi::Tokenizer::MistralTokenizer"
url "https://api.mistral.ai/v1/chat/completions"
provider_params { { disable_native_tools: false } }
end
Fabricator(:seeded_model, from: :llm_model) do
id "-2"
display_name "CDCK Hosted Model"
name "cdck-hosted"
provider "fake"
api_key "DSC"
tokenizer "DiscourseAi::Tokenizer::OpenAiTokenizer"
url "https://cdck.test/"
end