discourse/plugins/discourse-ai/lib
Rafael dos Santos Silva 2e3b64fa74
FEATURE: extract text from ODT and ODS document uploads (#39711)
## Summary

Follow-up to #39634. Adds `OdtToText` and `OdsToText` converters so
OpenDocument text (`.odt`) and spreadsheet (`.ods`) attachments can be
embedded as text in LLM prompts, in line with the newly added DOCX/XLSX
support. Both formats are zip archives with a single `content.xml`, so
they reuse `Compression::SafeZipReader` and the bounded Nokogiri parsing
pattern from #39634 — no new external binaries.

- `OdtToText` walks the body's block-level children (paragraphs,
headings, lists, tables, frames, sections) and renders nested lists with
depth-aware bullet prefixes. Tables become tab-separated rows.
- `OdsToText` iterates sheets and rows, expanding
`table:number-columns-repeated` up to `MAX_COLUMNS` to avoid expansion
bombs from sparse trailing cells, and falls back to `office:value` /
`office:date-value` / `office:boolean-value` when no inline `<text:p>`
is present.
- `UploadEncoder.attachment_type_for` and `encode_document` dispatch
gain `odt` and `ods` cases.
- `ai-llm-attachment-types` `DEFAULT_CHOICES` lists `odt` next to `docx`
and `ods` next to `xlsx`.

## Test plan

- [x] `bin/rspec
plugins/discourse-ai/spec/lib/completions/odt_to_text_spec.rb` — 6 cases
- [x] `bin/rspec
plugins/discourse-ai/spec/lib/completions/ods_to_text_spec.rb` — 6 cases
- [x] `bin/rspec
plugins/discourse-ai/spec/lib/completions/upload_encoder_spec.rb` — full
encoder suite incl. 4 new ODT/ODS integration cases
- [x] `bin/lint` clean across all touched files
- [ ] Manual smoke: upload a real `.odt` and `.ods` to a topic, assign
an LLM with the new attachment types allowed, and verify the extracted
text appears in the prompt
2026-05-05 12:04:13 -03:00
..
agents FIX: Ensure that the current user's guardian is used when running AI tools tests. (#38676) 2026-04-29 16:25:52 -03:00
ai_bot FEATURE: extract text from document uploads for LLM prompts (#39634) 2026-05-05 08:16:23 +10:00
ai_helper DEV: AI persona to agent migration (#38319) 2026-03-10 15:59:45 +11:00
ai_moderation DEV: Stop prometheus/ai spam in plugin backend specs (#39245) 2026-04-14 10:24:46 +02:00
ai_tool_scripts FEATURE: Add category definition filters and AI filterTopics binding (#39478) 2026-04-23 17:07:07 +10:00
automation FEATURE: extract text from document uploads for LLM prompts (#39634) 2026-05-05 08:16:23 +10:00
completions FEATURE: extract text from ODT and ODS document uploads (#39711) 2026-05-05 12:04:13 -03:00
configuration DEV: Fix assigned but unused variable Prism warnings (#39436) 2026-04-22 12:42:14 +02:00
database DEV: Clean up scope resolution operators in plugins (#34979) 2025-09-30 14:36:34 +02:00
discord/bot DEV: AI persona to agent migration (#38319) 2026-03-10 15:59:45 +11:00
discover DEV: AI persona to agent migration (#38319) 2026-03-10 15:59:45 +11:00
embeddings DEV: Tidy plugin API key scope resource names (#38640) 2026-03-17 13:03:42 +11:00
inference DEV: Make persona image tools platform agnostic (#36195) 2025-11-26 09:36:24 -08:00
inferred_concepts DEV: Fix assigned but unused variable Prism warnings (#39436) 2026-04-22 12:42:14 +02:00
mcp FEATURE: Add advanced OAuth options for MCP servers (#38913) 2026-04-01 08:47:23 +11:00
sentiment DEV: Replace Ruby numbered parameters by it where applicable (#37810) 2026-02-13 13:59:07 +01:00
summarization DEV: Fix assigned but unused variable Prism warnings (#39436) 2026-04-22 12:42:14 +02:00
tasks DEV: Remove old Discourse AI rake tasks (#36384) 2025-12-02 14:34:08 -03:00
translation PERF: Lazy-load translation progress chart on admin AI translations page (#39458) 2026-04-23 11:31:24 -07:00
utils DEV: AI persona to agent migration (#38319) 2026-03-10 15:59:45 +11:00
ai_bot.rb DEV: AI persona to agent migration (#38319) 2026-03-10 15:59:45 +11:00
automation.rb DEV: AI persona to agent migration (#38319) 2026-03-10 15:59:45 +11:00
embeddings.rb
engine.rb DEV: Clean up scope resolution operators in plugins (#34979) 2025-09-30 14:36:34 +02:00
guardian_extensions.rb DEV: Fix assigned but unused variable Prism warnings (#39436) 2026-04-22 12:42:14 +02:00
multisite_hash.rb
post_extensions.rb
summarization.rb DEV: AI persona to agent migration (#38319) 2026-03-10 15:59:45 +11:00
topic_extensions.rb
translation.rb DEV: AI persona to agent migration (#38319) 2026-03-10 15:59:45 +11:00