mirror of
https://gh.wpcy.net/https://github.com/discourse/discourse.git
synced 2026-06-19 12:14:16 +08:00
## Summary
Image uploads delivered through the `aws_bedrock_converse` LLM provider
were rejected by Bedrock with `Could not process image` whenever an
agent / LLM had `vision_enabled` set to true.
Two related bugs are fixed:
### 1. `Dialects::Converse#upload_node` — base64 string passed where raw
bytes expected
In `plugins/discourse-ai/lib/completions/dialects/converse.rb`, image
content was emitted as:
```ruby
source: { bytes: details[:base64] }
```
`details[:base64]` is the upload's base64-encoded string (as produced by
`UploadEncoder`), but `Aws::BedrockRuntime::Client#converse` expects
**raw bytes** on the `:bytes` key — the SDK then base64-encodes them on
the wire. Passing the already-base64-encoded string causes Bedrock to
receive **doubly-encoded** data, which it cannot decode into a valid
image. Decoding back to raw bytes via
`Base64.decode64(details[:base64])` resolves the round-trip.
### 2. `AwsBedrockConverse#perform_completion!` — JSON-logging fails on
binary payloads
With raw bytes now flowing through `sdk_params`, the subsequent
`sdk_params.to_json` call (used to record the request in `start_log`)
raises `EncodingError` because PNG/JPEG bytes are not valid UTF-8. The
call is wrapped in `begin / rescue EncodingError` so the request can
still proceed; a placeholder string is recorded in the audit log instead
of the binary payload.
## Test plan
- A new spec case in
`plugins/discourse-ai/spec/lib/completions/dialects/converse_spec.rb`
asserts that `details[:base64]` is decoded back to raw bytes before
being emitted as `source: { bytes: ... }`. This guards against
regression.
- Verified end-to-end against `us.anthropic.claude-sonnet-4-6` via
Bedrock Converse on `ap-northeast-1` → `us-east-1` cross-region
inference profile: with this patch the model correctly describes
uploaded PNG attachments (a Loupe Browser version warning dialog)
instead of returning `Could not process image`.
## Reproduction (before the fix)
1. Configure an `aws_bedrock_converse` LLM in Discourse and assign it to
an `AiAgent` with `vision_enabled: true`.
2. Wire up `llm_triage` (or any path that goes through
`Dialects::Converse#upload_node`) to reply to a topic that contains an
image upload.
3. Observe:
`DiscourseAi::Completions::Endpoints::Base::CompletionFailed: The model
returned the following errors: Could not process image`
## Discovered while
Standing up a Discourse instance with Bedrock-backed AI as part of an
internal forum spike. Happy to iterate on the patch (e.g. tighten the
log fallback or extract a helper) if reviewers prefer a different shape.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Rafael Silva <xfalcox@gmail.com>
88 lines
3.2 KiB
Ruby
Vendored
88 lines
3.2 KiB
Ruby
Vendored
# frozen_string_literal: true
|
|
|
|
RSpec.describe DiscourseAi::Completions::Dialects::Converse do
|
|
fab!(:model, :bedrock_converse_model)
|
|
|
|
before { enable_current_plugin }
|
|
|
|
describe "#translate" do
|
|
it "renders converted document uploads as text content blocks" do
|
|
model.update!(allowed_attachment_types: ["docx"])
|
|
converted_text = "Uploaded document: sample.docx (13 Bytes)\n\nConverted text"
|
|
prompt =
|
|
DiscourseAi::Completions::Prompt.new(
|
|
nil,
|
|
messages: [{ type: :user, content: ["Read this: ", { upload_id: 123 }] }],
|
|
)
|
|
|
|
allow(DiscourseAi::Completions::UploadEncoder).to receive(:encode).and_return(
|
|
[
|
|
{
|
|
kind: :document,
|
|
filename: "sample.docx",
|
|
mime_type: "text/plain",
|
|
text: converted_text,
|
|
converted_from: "docx",
|
|
},
|
|
],
|
|
)
|
|
|
|
translated = described_class.new(prompt, model).translate
|
|
user_message = translated.messages.find { |msg| msg[:role] == "user" }
|
|
|
|
expect(user_message[:content]).to eq([{ text: "Read this: " }, { text: converted_text }])
|
|
end
|
|
|
|
it "skips raw document uploads because Converse raw document support is not enabled" do
|
|
model.update!(allowed_attachment_types: ["doc"])
|
|
prompt =
|
|
DiscourseAi::Completions::Prompt.new(
|
|
nil,
|
|
messages: [{ type: :user, content: ["Read this: ", { upload_id: 123 }] }],
|
|
)
|
|
|
|
allow(DiscourseAi::Completions::UploadEncoder).to receive(:encode).and_return(
|
|
[
|
|
{
|
|
kind: :document,
|
|
filename: "sample.doc",
|
|
mime_type: "application/msword",
|
|
base64: "cmF3IGRvYw==",
|
|
},
|
|
],
|
|
)
|
|
|
|
translated = described_class.new(prompt, model).translate
|
|
user_message = translated.messages.find { |msg| msg[:role] == "user" }
|
|
|
|
expect(user_message[:content]).to eq([{ text: "Read this: " }])
|
|
expect(user_message[:content]).not_to include(hash_including(image: anything))
|
|
expect(user_message[:content]).not_to include(hash_including(document: anything))
|
|
end
|
|
|
|
it "passes raw bytes for image uploads, not the base64-encoded string" do
|
|
model.update!(vision_enabled: true)
|
|
raw_bytes = "\x89PNG\r\n\x1a\nbinary".b
|
|
prompt =
|
|
DiscourseAi::Completions::Prompt.new(
|
|
nil,
|
|
messages: [{ type: :user, content: ["Describe: ", { upload_id: 456 }] }],
|
|
)
|
|
|
|
allow(DiscourseAi::Completions::UploadEncoder).to receive(:encode).and_return(
|
|
[{ kind: :image, mime_type: "image/png", base64: Base64.strict_encode64(raw_bytes) }],
|
|
)
|
|
|
|
translated = described_class.new(prompt, model).translate
|
|
user_message = translated.messages.find { |msg| msg[:role] == "user" }
|
|
image_block = user_message[:content].find { |c| c[:image] }
|
|
|
|
expect(image_block).to be_present
|
|
expect(image_block.dig(:image, :format)).to eq("png")
|
|
# AWS SDK for Ruby expects raw bytes; it will base64-encode on the wire.
|
|
# Passing the base64 string would cause double-encoding and Bedrock would
|
|
# return "Could not process image".
|
|
expect(image_block.dig(:image, :source, :bytes)).to eq(raw_bytes)
|
|
end
|
|
end
|
|
end
|