discourse/plugins/discourse-ai/spec/lib/completions
Rafael dos Santos Silva 2e3b64fa74
FEATURE: extract text from ODT and ODS document uploads (#39711)
## Summary

Follow-up to #39634. Adds `OdtToText` and `OdsToText` converters so
OpenDocument text (`.odt`) and spreadsheet (`.ods`) attachments can be
embedded as text in LLM prompts, in line with the newly added DOCX/XLSX
support. Both formats are zip archives with a single `content.xml`, so
they reuse `Compression::SafeZipReader` and the bounded Nokogiri parsing
pattern from #39634 — no new external binaries.

- `OdtToText` walks the body's block-level children (paragraphs,
headings, lists, tables, frames, sections) and renders nested lists with
depth-aware bullet prefixes. Tables become tab-separated rows.
- `OdsToText` iterates sheets and rows, expanding
`table:number-columns-repeated` up to `MAX_COLUMNS` to avoid expansion
bombs from sparse trailing cells, and falls back to `office:value` /
`office:date-value` / `office:boolean-value` when no inline `<text:p>`
is present.
- `UploadEncoder.attachment_type_for` and `encode_document` dispatch
gain `odt` and `ods` cases.
- `ai-llm-attachment-types` `DEFAULT_CHOICES` lists `odt` next to `docx`
and `ods` next to `xlsx`.

## Test plan

- [x] `bin/rspec
plugins/discourse-ai/spec/lib/completions/odt_to_text_spec.rb` — 6 cases
- [x] `bin/rspec
plugins/discourse-ai/spec/lib/completions/ods_to_text_spec.rb` — 6 cases
- [x] `bin/rspec
plugins/discourse-ai/spec/lib/completions/upload_encoder_spec.rb` — full
encoder suite incl. 4 new ODT/ODS integration cases
- [x] `bin/lint` clean across all touched files
- [ ] Manual smoke: upload a real `.odt` and `.ods` to a topic, assign
an LLM with the new attachment types allowed, and verify the extracted
text appears in the prompt
2026-05-05 12:04:13 -03:00
..
dialects FEATURE: extract text from document uploads for LLM prompts (#39634) 2026-05-05 08:16:23 +10:00
endpoints DEV: update preset model versions (#39342) 2026-04-17 17:55:24 +10:00
anthropic_message_processor_spec.rb FEATURE: support thinking summary on responses API (#36013) 2025-11-18 07:54:14 +11:00
cancel_manager_spec.rb FEATURE: Add MCP server integration to AI agents (#38706) 2026-03-25 17:32:27 +11:00
doc_to_text_spec.rb FEATURE: extract text from document uploads for LLM prompts (#39634) 2026-05-05 08:16:23 +10:00
docx_to_text_spec.rb FEATURE: extract text from document uploads for LLM prompts (#39634) 2026-05-05 08:16:23 +10:00
json_stream_decoder_spec.rb
llm_metric_spec.rb FEATURE: Add Prometheus metrics for LLM API calls (#35636) 2025-10-28 14:10:42 -03:00
llm_spec.rb FEATURE: gate temperature/top_p behind setting (#38479) 2026-03-12 07:40:29 +11:00
ods_to_text_spec.rb FEATURE: extract text from ODT and ODS document uploads (#39711) 2026-05-05 12:04:13 -03:00
odt_to_text_spec.rb FEATURE: extract text from ODT and ODS document uploads (#39711) 2026-05-05 12:04:13 -03:00
open_ai_message_processor_spec.rb FIX: clear partial flag when streaming tool calls finish (#35605) 2025-10-27 12:47:14 +11:00
prompt_messages_builder_spec.rb FEATURE: extract text from document uploads for LLM prompts (#39634) 2026-05-05 08:16:23 +10:00
prompt_spec.rb FEATURE: extract text from document uploads for LLM prompts (#39634) 2026-05-05 08:16:23 +10:00
report_spec.rb DEV: Adjustments in usage page for LLM's with credit allocations (#36566) 2025-12-09 09:55:29 -08:00
rtf_to_text_spec.rb FEATURE: extract text from document uploads for LLM prompts (#39634) 2026-05-05 08:16:23 +10:00
structured_output_spec.rb FIX: Harden JSON streaming tracker for arrays of objects (#36047) 2025-11-14 15:46:06 -03:00
token_usage_tracker_spec.rb FEATURE: add agentic execution mode for AI personas (#38230) 2026-03-05 15:06:54 +11:00
tool_definition_spec.rb FEATURE: improve image tool presets and tool editor (#38166) 2026-03-04 07:21:31 +11:00
upload_encoder_spec.rb FEATURE: extract text from ODT and ODS document uploads (#39711) 2026-05-05 12:04:13 -03:00
xls_to_text_spec.rb FEATURE: extract text from document uploads for LLM prompts (#39634) 2026-05-05 08:16:23 +10:00
xlsx_to_text_spec.rb FEATURE: extract text from document uploads for LLM prompts (#39634) 2026-05-05 08:16:23 +10:00
xml_tag_stripper_spec.rb
xml_tool_processor_spec.rb