mirror of
https://gh.wpcy.net/https://github.com/discourse/discourse.git
synced 2026-05-07 02:22:49 +08:00
## Summary Follow-up to #39634. Adds `OdtToText` and `OdsToText` converters so OpenDocument text (`.odt`) and spreadsheet (`.ods`) attachments can be embedded as text in LLM prompts, in line with the newly added DOCX/XLSX support. Both formats are zip archives with a single `content.xml`, so they reuse `Compression::SafeZipReader` and the bounded Nokogiri parsing pattern from #39634 — no new external binaries. - `OdtToText` walks the body's block-level children (paragraphs, headings, lists, tables, frames, sections) and renders nested lists with depth-aware bullet prefixes. Tables become tab-separated rows. - `OdsToText` iterates sheets and rows, expanding `table:number-columns-repeated` up to `MAX_COLUMNS` to avoid expansion bombs from sparse trailing cells, and falls back to `office:value` / `office:date-value` / `office:boolean-value` when no inline `<text:p>` is present. - `UploadEncoder.attachment_type_for` and `encode_document` dispatch gain `odt` and `ods` cases. - `ai-llm-attachment-types` `DEFAULT_CHOICES` lists `odt` next to `docx` and `ods` next to `xlsx`. ## Test plan - [x] `bin/rspec plugins/discourse-ai/spec/lib/completions/odt_to_text_spec.rb` — 6 cases - [x] `bin/rspec plugins/discourse-ai/spec/lib/completions/ods_to_text_spec.rb` — 6 cases - [x] `bin/rspec plugins/discourse-ai/spec/lib/completions/upload_encoder_spec.rb` — full encoder suite incl. 4 new ODT/ODS integration cases - [x] `bin/lint` clean across all touched files - [ ] Manual smoke: upload a real `.odt` and `.ods` to a topic, assign an LLM with the new attachment types allowed, and verify the extracted text appears in the prompt |
||
|---|---|---|
| .. | ||
| dialects | ||
| endpoints | ||
| anthropic_message_processor_spec.rb | ||
| cancel_manager_spec.rb | ||
| doc_to_text_spec.rb | ||
| docx_to_text_spec.rb | ||
| json_stream_decoder_spec.rb | ||
| llm_metric_spec.rb | ||
| llm_spec.rb | ||
| ods_to_text_spec.rb | ||
| odt_to_text_spec.rb | ||
| open_ai_message_processor_spec.rb | ||
| prompt_messages_builder_spec.rb | ||
| prompt_spec.rb | ||
| report_spec.rb | ||
| rtf_to_text_spec.rb | ||
| structured_output_spec.rb | ||
| token_usage_tracker_spec.rb | ||
| tool_definition_spec.rb | ||
| upload_encoder_spec.rb | ||
| xls_to_text_spec.rb | ||
| xlsx_to_text_spec.rb | ||
| xml_tag_stripper_spec.rb | ||
| xml_tool_processor_spec.rb | ||