mirror of
https://gh.wpcy.net/https://github.com/discourse/discourse.git
synced 2026-06-19 05:35:40 +08:00
## Summary Adds a new `aws_bedrock_converse` inference provider that uses the official AWS SDK (`aws-sdk-bedrockruntime`) and the Converse API. This runs alongside the existing `aws_bedrock` provider — fully additive, zero risk to existing configurations. ### Why a new provider? The existing `aws_bedrock` provider manually handles SigV4 signing, URL construction, binary event stream decoding, and maintains a hardcoded model ID mapping table. It only supports Claude and Nova models. The new provider delegates all of this to the official AWS SDK, which means: - **Model-agnostic** — works with any model available on Bedrock (Claude, Nova, Kimi, MiniMax, Mistral, Llama, DeepSeek, NVIDIA, Qwen, GLM, etc.) without any model-specific code - **Application Inference Profiles** — users can set cross-region profiles (`us.anthropic.claude-sonnet-4-20250514-v1:0`) or application inference profile ARNs directly as the model name - **Bedrock API Key auth** — supports the new AWS Bedrock API keys (Bearer token auth) in addition to IAM access keys, STS role assumption, and automatic credential resolution from environment/instance profiles - **No maintenance burden** — no model ID mapping table to update when AWS adds new models, no manual SigV4 signing, no binary event stream decoding - **Native tools only** — no XML tool fallback; uses the Converse API's built-in tool support ### Authentication options (priority order) | Config | Auth method | |---|---| | `role_arn` set | STS AssumeRole (SigV4) | | `access_key_id` set | Static IAM credentials (SigV4) | | API key set (no access_key_id/role_arn) | Bearer token (Bedrock API key) | | Nothing set | SDK auto-resolves (env vars, instance profile, ECS task role) | ### Features supported - Streaming and non-streaming completions - Native tool use with tool_choice (auto/any/specific tool) - Structured output via Converse API's `output_config` (models that support it) - Extended thinking / adaptive thinking with signature preservation for multi-turn - Interleaved thinking with tool calls (thinking blocks preserved per tool_call message) - Prompt caching via `cache_point` blocks - Effort parameter (low/medium/high/max) - `extra_model_fields` provider param for arbitrary `additionalModelRequestFields` (beta features like `anthropic_beta`, 1M context, interleaved thinking) ### New files - `lib/completions/endpoints/aws_bedrock_converse.rb` — endpoint using `Aws::BedrockRuntime::Client` - `lib/completions/dialects/converse.rb` — unified Converse API dialect - `lib/completions/dialects/converse_tools.rb` — tool formatting - `lib/completions/converse_message_processor.rb` — response processing for SDK typed objects ## Tested against real Bedrock API All tests performed using Bedrock API Key auth (Bearer token) against live endpoints with 9 different models from 8 providers: | Test | Claude Sonnet 4 | Claude Haiku 4.5 | Kimi K2.5 | MiniMax M2 | DeepSeek 3.2 | NVIDIA Nemotron 3 120B | Qwen3 Next 80B | GLM 5 | Mistral Small | |---|---|---|---|---|---|---|---|---|---| | Non-streaming text | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Streaming text | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Multi-turn conversation | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Tool use (non-streaming) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Tool use (streaming) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ model unsupported | | Structured output (non-streaming) | — | ✅ | ❌ model unsupported | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ model unsupported | | Structured output (streaming) | — | ✅ | ❌ model unsupported | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ model unsupported | | Bearer token auth | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Cross-region inference profile | ✅ | ✅ | — | — | — | — | — | — | — | | Audit logging + token tracking | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | > **Notes:** > - Claude Sonnet 4 structured output not tested — requires 4.5+ for this feature and those cross-region profiles were not available in the test region. > - Kimi K2.5 and Mistral Small do not support Bedrock's native structured output. > - Mistral Small does not support streaming tool use. > - All ❌ results are model-level limitations, not code issues — the Converse API correctly surfaces the error. ## Test plan - [ ] Existing `aws_bedrock` provider tests pass (`bin/rspec spec/lib/completions/endpoints/aws_bedrock_spec.rb`) - [ ] New provider tests pass (`bin/rspec spec/lib/completions/endpoints/aws_bedrock_converse_spec.rb`) - [ ] Create an LLM model with provider "AWS Bedrock (Converse API)" in admin UI - [ ] Verify basic completion works with a Bedrock API key (just region + API key, no IAM keys needed) - [ ] Verify tool use works in AI bot conversations - [ ] Verify structured output works with a supported model (Claude Haiku 4.5+)
142 lines
3.8 KiB
Ruby
Vendored
142 lines
3.8 KiB
Ruby
Vendored
# frozen_string_literal: true
|
|
|
|
Fabricator(:llm_model) do
|
|
display_name "A good model"
|
|
name "gpt-4-turbo"
|
|
provider "open_ai"
|
|
tokenizer "DiscourseAi::Tokenizer::OpenAiTokenizer"
|
|
api_key "123"
|
|
url "https://api.openai.com/v1/chat/completions"
|
|
max_prompt_tokens 131_072
|
|
input_cost 10
|
|
cached_input_cost 2.5
|
|
output_cost 40
|
|
end
|
|
|
|
Fabricator(:anthropic_model, from: :llm_model) do
|
|
display_name "Claude 3 Opus"
|
|
name "claude-3-opus"
|
|
max_prompt_tokens 200_000
|
|
url "https://api.anthropic.com/v1/messages"
|
|
tokenizer "DiscourseAi::Tokenizer::AnthropicTokenizer"
|
|
provider "anthropic"
|
|
end
|
|
|
|
Fabricator(:hf_model, from: :llm_model) do
|
|
display_name "Llama 3.1"
|
|
name "meta-llama/Meta-Llama-3.1-70B-Instruct"
|
|
max_prompt_tokens 64_000
|
|
tokenizer "DiscourseAi::Tokenizer::Llama3Tokenizer"
|
|
url "https://test.dev/v1/chat/completions"
|
|
provider "hugging_face"
|
|
end
|
|
|
|
Fabricator(:open_router_model, from: :llm_model) do
|
|
display_name "OpenRouter"
|
|
name "openrouter-1.0"
|
|
provider "open_router"
|
|
tokenizer "DiscourseAi::Tokenizer::OpenAiTokenizer"
|
|
max_prompt_tokens 64_000
|
|
url "https://openrouter.ai/api/v1/chat/completions"
|
|
end
|
|
|
|
Fabricator(:vllm_model, from: :llm_model) do
|
|
display_name "Llama 3.1 vLLM"
|
|
name "meta-llama/Meta-Llama-3.1-70B-Instruct"
|
|
max_prompt_tokens 64_000
|
|
tokenizer "DiscourseAi::Tokenizer::Llama3Tokenizer"
|
|
url "https://test.dev/v1/chat/completions"
|
|
provider "vllm"
|
|
end
|
|
|
|
Fabricator(:fake_model, from: :llm_model) do
|
|
display_name "Fake model"
|
|
name "fake"
|
|
provider "fake"
|
|
tokenizer "DiscourseAi::Tokenizer::OpenAiTokenizer"
|
|
max_prompt_tokens 32_000
|
|
api_key "fake"
|
|
url "https://fake.test/"
|
|
end
|
|
|
|
Fabricator(:gemini_model, from: :llm_model) do
|
|
display_name "Gemini"
|
|
name "gemini-1.5-pro"
|
|
provider "google"
|
|
tokenizer "DiscourseAi::Tokenizer::OpenAiTokenizer"
|
|
max_prompt_tokens 800_000
|
|
url "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-latest"
|
|
end
|
|
|
|
Fabricator(:bedrock_model, from: :anthropic_model) do
|
|
url ""
|
|
provider "aws_bedrock"
|
|
api_key "asd-asd-asd"
|
|
name "claude-3-sonnet"
|
|
provider_params { { region: "us-east-1", access_key_id: "123456" } }
|
|
end
|
|
|
|
Fabricator(:bedrock_converse_model, from: :anthropic_model) do
|
|
url ""
|
|
provider "aws_bedrock_converse"
|
|
api_key "asd-asd-asd"
|
|
name "claude-3-sonnet"
|
|
provider_params { { region: "us-east-1", access_key_id: "123456" } }
|
|
end
|
|
|
|
Fabricator(:nova_model, from: :llm_model) do
|
|
display_name "Amazon Nova pro"
|
|
name "amazon.nova-pro-v1:0"
|
|
provider "aws_bedrock"
|
|
tokenizer "DiscourseAi::Tokenizer::OpenAiTokenizer"
|
|
max_prompt_tokens 300_000
|
|
api_key "fake"
|
|
url ""
|
|
provider_params { { region: "us-east-1", access_key_id: "123456" } }
|
|
end
|
|
|
|
Fabricator(:cohere_model, from: :llm_model) do
|
|
display_name "Cohere Command R+"
|
|
name "command-r-plus"
|
|
provider "cohere"
|
|
api_key "ABC"
|
|
url "https://api.cohere.ai/v1/chat"
|
|
end
|
|
|
|
Fabricator(:samba_nova_model, from: :llm_model) do
|
|
display_name "Samba Nova"
|
|
name "samba-nova"
|
|
provider "samba_nova"
|
|
api_key "ABC"
|
|
url "https://api.sambanova.ai/v1/chat/completions"
|
|
end
|
|
|
|
Fabricator(:ollama_model, from: :llm_model) do
|
|
display_name "Ollama llama 3.1"
|
|
name "llama-3.1"
|
|
provider "ollama"
|
|
api_key "ABC"
|
|
tokenizer "DiscourseAi::Tokenizer::Llama3Tokenizer"
|
|
url "http://api.ollama.ai/api/chat"
|
|
provider_params { { enable_native_tool: true } }
|
|
end
|
|
|
|
Fabricator(:mistral_model, from: :llm_model) do
|
|
display_name "Mistral Large"
|
|
name "mistral-large-latest"
|
|
provider "mistral"
|
|
api_key "ABC"
|
|
tokenizer "DiscourseAi::Tokenizer::MistralTokenizer"
|
|
url "https://api.mistral.ai/v1/chat/completions"
|
|
provider_params { { disable_native_tools: false } }
|
|
end
|
|
|
|
Fabricator(:seeded_model, from: :llm_model) do
|
|
id "-2"
|
|
display_name "CDCK Hosted Model"
|
|
name "cdck-hosted"
|
|
provider "fake"
|
|
api_key "DSC"
|
|
tokenizer "DiscourseAi::Tokenizer::OpenAiTokenizer"
|
|
url "https://cdck.test/"
|
|
end
|