discourse/plugins/discourse-ai/spec/lib/completions/dialects/converse_spec.rb
Takao Yokoyama cea4780cb3
FIX: AI: AWS Bedrock Converse image uploads were doubly base64-encoded (#39880)
## Summary

Image uploads delivered through the `aws_bedrock_converse` LLM provider
were rejected by Bedrock with `Could not process image` whenever an
agent / LLM had `vision_enabled` set to true.

Two related bugs are fixed:

### 1. `Dialects::Converse#upload_node` — base64 string passed where raw
bytes expected

In `plugins/discourse-ai/lib/completions/dialects/converse.rb`, image
content was emitted as:

```ruby
source: { bytes: details[:base64] }
```

`details[:base64]` is the upload's base64-encoded string (as produced by
`UploadEncoder`), but `Aws::BedrockRuntime::Client#converse` expects
**raw bytes** on the `:bytes` key — the SDK then base64-encodes them on
the wire. Passing the already-base64-encoded string causes Bedrock to
receive **doubly-encoded** data, which it cannot decode into a valid
image. Decoding back to raw bytes via
`Base64.decode64(details[:base64])` resolves the round-trip.

### 2. `AwsBedrockConverse#perform_completion!` — JSON-logging fails on
binary payloads

With raw bytes now flowing through `sdk_params`, the subsequent
`sdk_params.to_json` call (used to record the request in `start_log`)
raises `EncodingError` because PNG/JPEG bytes are not valid UTF-8. The
call is wrapped in `begin / rescue EncodingError` so the request can
still proceed; a placeholder string is recorded in the audit log instead
of the binary payload.

## Test plan

- A new spec case in
`plugins/discourse-ai/spec/lib/completions/dialects/converse_spec.rb`
asserts that `details[:base64]` is decoded back to raw bytes before
being emitted as `source: { bytes: ... }`. This guards against
regression.
- Verified end-to-end against `us.anthropic.claude-sonnet-4-6` via
Bedrock Converse on `ap-northeast-1` → `us-east-1` cross-region
inference profile: with this patch the model correctly describes
uploaded PNG attachments (a Loupe Browser version warning dialog)
instead of returning `Could not process image`.

## Reproduction (before the fix)

1. Configure an `aws_bedrock_converse` LLM in Discourse and assign it to
an `AiAgent` with `vision_enabled: true`.
2. Wire up `llm_triage` (or any path that goes through
`Dialects::Converse#upload_node`) to reply to a topic that contains an
image upload.
3. Observe:
`DiscourseAi::Completions::Endpoints::Base::CompletionFailed: The model
returned the following errors: Could not process image`

## Discovered while

Standing up a Discourse instance with Bedrock-backed AI as part of an
internal forum spike. Happy to iterate on the patch (e.g. tighten the
log fallback or extract a helper) if reviewers prefer a different shape.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Rafael Silva <xfalcox@gmail.com>
2026-05-26 15:16:49 -03:00

88 lines
3.2 KiB
Ruby
Vendored

# frozen_string_literal: true
RSpec.describe DiscourseAi::Completions::Dialects::Converse do
fab!(:model, :bedrock_converse_model)
before { enable_current_plugin }
describe "#translate" do
it "renders converted document uploads as text content blocks" do
model.update!(allowed_attachment_types: ["docx"])
converted_text = "Uploaded document: sample.docx (13 Bytes)\n\nConverted text"
prompt =
DiscourseAi::Completions::Prompt.new(
nil,
messages: [{ type: :user, content: ["Read this: ", { upload_id: 123 }] }],
)
allow(DiscourseAi::Completions::UploadEncoder).to receive(:encode).and_return(
[
{
kind: :document,
filename: "sample.docx",
mime_type: "text/plain",
text: converted_text,
converted_from: "docx",
},
],
)
translated = described_class.new(prompt, model).translate
user_message = translated.messages.find { |msg| msg[:role] == "user" }
expect(user_message[:content]).to eq([{ text: "Read this: " }, { text: converted_text }])
end
it "skips raw document uploads because Converse raw document support is not enabled" do
model.update!(allowed_attachment_types: ["doc"])
prompt =
DiscourseAi::Completions::Prompt.new(
nil,
messages: [{ type: :user, content: ["Read this: ", { upload_id: 123 }] }],
)
allow(DiscourseAi::Completions::UploadEncoder).to receive(:encode).and_return(
[
{
kind: :document,
filename: "sample.doc",
mime_type: "application/msword",
base64: "cmF3IGRvYw==",
},
],
)
translated = described_class.new(prompt, model).translate
user_message = translated.messages.find { |msg| msg[:role] == "user" }
expect(user_message[:content]).to eq([{ text: "Read this: " }])
expect(user_message[:content]).not_to include(hash_including(image: anything))
expect(user_message[:content]).not_to include(hash_including(document: anything))
end
it "passes raw bytes for image uploads, not the base64-encoded string" do
model.update!(vision_enabled: true)
raw_bytes = "\x89PNG\r\n\x1a\nbinary".b
prompt =
DiscourseAi::Completions::Prompt.new(
nil,
messages: [{ type: :user, content: ["Describe: ", { upload_id: 456 }] }],
)
allow(DiscourseAi::Completions::UploadEncoder).to receive(:encode).and_return(
[{ kind: :image, mime_type: "image/png", base64: Base64.strict_encode64(raw_bytes) }],
)
translated = described_class.new(prompt, model).translate
user_message = translated.messages.find { |msg| msg[:role] == "user" }
image_block = user_message[:content].find { |c| c[:image] }
expect(image_block).to be_present
expect(image_block.dig(:image, :format)).to eq("png")
# AWS SDK for Ruby expects raw bytes; it will base64-encode on the wire.
# Passing the base64 string would cause double-encoding and Bedrock would
# return "Could not process image".
expect(image_block.dig(:image, :source, :bytes)).to eq(raw_bytes)
end
end
end