discourse/plugins/discourse-ai/lib/completions
Rafael dos Santos Silva 2e3b64fa74
FEATURE: extract text from ODT and ODS document uploads (#39711)
## Summary

Follow-up to #39634. Adds `OdtToText` and `OdsToText` converters so
OpenDocument text (`.odt`) and spreadsheet (`.ods`) attachments can be
embedded as text in LLM prompts, in line with the newly added DOCX/XLSX
support. Both formats are zip archives with a single `content.xml`, so
they reuse `Compression::SafeZipReader` and the bounded Nokogiri parsing
pattern from #39634 — no new external binaries.

- `OdtToText` walks the body's block-level children (paragraphs,
headings, lists, tables, frames, sections) and renders nested lists with
depth-aware bullet prefixes. Tables become tab-separated rows.
- `OdsToText` iterates sheets and rows, expanding
`table:number-columns-repeated` up to `MAX_COLUMNS` to avoid expansion
bombs from sparse trailing cells, and falls back to `office:value` /
`office:date-value` / `office:boolean-value` when no inline `<text:p>`
is present.
- `UploadEncoder.attachment_type_for` and `encode_document` dispatch
gain `odt` and `ods` cases.
- `ai-llm-attachment-types` `DEFAULT_CHOICES` lists `odt` next to `docx`
and `ods` next to `xlsx`.

## Test plan

- [x] `bin/rspec
plugins/discourse-ai/spec/lib/completions/odt_to_text_spec.rb` — 6 cases
- [x] `bin/rspec
plugins/discourse-ai/spec/lib/completions/ods_to_text_spec.rb` — 6 cases
- [x] `bin/rspec
plugins/discourse-ai/spec/lib/completions/upload_encoder_spec.rb` — full
encoder suite incl. 4 new ODT/ODS integration cases
- [x] `bin/lint` clean across all touched files
- [ ] Manual smoke: upload a real `.odt` and `.ods` to a topic, assign
an LLM with the new attachment types allowed, and verify the extracted
text appears in the prompt
2026-05-05 12:04:13 -03:00
..
dialects FEATURE: extract text from document uploads for LLM prompts (#39634) 2026-05-05 08:16:23 +10:00
endpoints DEV: update preset model versions (#39342) 2026-04-17 17:55:24 +10:00
anthropic_message_processor.rb FEATURE: add support for tracking write tokens and anthropic caching and Gemini Pro 3 (#36113) 2025-11-21 07:42:54 +11:00
cancel_manager.rb
converse_message_processor.rb FEATURE: Add AWS Bedrock Converse API provider (#38903) 2026-03-30 12:37:30 -03:00
doc_to_text.rb FEATURE: extract text from document uploads for LLM prompts (#39634) 2026-05-05 08:16:23 +10:00
docx_to_text.rb FEATURE: extract text from document uploads for LLM prompts (#39634) 2026-05-05 08:16:23 +10:00
execution_context.rb FEATURE: add agentic execution mode for AI personas (#38230) 2026-03-05 15:06:54 +11:00
json_stream_decoder.rb
json_streaming_parser.rb
json_streaming_tracker.rb FEATURE: Use evals to compare LLMs and Personas' prompts (#36027) 2025-11-18 10:39:52 -03:00
llm.rb FEATURE: Add AWS Bedrock Converse API provider (#38903) 2026-03-30 12:37:30 -03:00
llm_metric.rb DEV: Stop prometheus/ai spam in plugin backend specs (#39245) 2026-04-14 10:24:46 +02:00
llm_presets.rb DEV: Refresh default LLM presets and pricing (#39673) 2026-04-30 17:02:32 -03:00
nova_message_processor.rb
ods_to_text.rb FEATURE: extract text from ODT and ODS document uploads (#39711) 2026-05-05 12:04:13 -03:00
odt_to_text.rb FEATURE: extract text from ODT and ODS document uploads (#39711) 2026-05-05 12:04:13 -03:00
open_ai_message_processor.rb FEATURE: add conditional param visibility and new reasoning options (#38106) 2026-03-02 11:18:43 +11:00
open_ai_responses_message_processor.rb FIX: properly terminate chains of tool calls across multiple providers (#36750) 2025-12-19 07:00:51 +11:00
prompt.rb FEATURE: extract text from document uploads for LLM prompts (#39634) 2026-05-05 08:16:23 +10:00
prompt_messages_builder.rb FEATURE: extract text from document uploads for LLM prompts (#39634) 2026-05-05 08:16:23 +10:00
report.rb FEATURE: show conversation spending in debug modal (#39364) 2026-04-20 16:14:38 +10:00
rtf_to_text.rb FEATURE: extract text from document uploads for LLM prompts (#39634) 2026-05-05 08:16:23 +10:00
structured_output.rb FEATURE: support thinking summary on responses API (#36013) 2025-11-18 07:54:14 +11:00
thinking.rb FEATURE: support thinking summary on responses API (#36013) 2025-11-18 07:54:14 +11:00
token_usage_tracker.rb FEATURE: add agentic execution mode for AI personas (#38230) 2026-03-05 15:06:54 +11:00
tool_call.rb FEATURE: add support for tracking write tokens and anthropic caching and Gemini Pro 3 (#36113) 2025-11-21 07:42:54 +11:00
tool_definition.rb FEATURE: Add MCP server integration to AI agents (#38706) 2026-03-25 17:32:27 +11:00
tool_result.rb DEV: ToolResult#== returns nil instead of false for non-ToolResult comparisons (#37522) 2026-02-04 15:01:31 +08:00
upload_encoder.rb FEATURE: extract text from ODT and ODS document uploads (#39711) 2026-05-05 12:04:13 -03:00
xls_to_text.rb FEATURE: extract text from document uploads for LLM prompts (#39634) 2026-05-05 08:16:23 +10:00
xlsx_to_text.rb FEATURE: extract text from document uploads for LLM prompts (#39634) 2026-05-05 08:16:23 +10:00
xml_tag_stripper.rb
xml_tool_processor.rb