discourse/plugins/discourse-ai/spec/lib at latest - Discourse/discourse - 菲码源库 feiCode.com

Discourse/discourse

mirror of https://gh.wpcy.net/https://github.com/discourse/discourse.git synced 2026-05-05 11:23:47 +08:00

History

Sam fa54f62348 FEATURE: extract text from document uploads for LLM prompts (#39634 ) Document attachments (doc, docx, xls, xlsx, rtf, csv, md, txt) are now converted to text before being included in LLM prompts, instead of being forwarded as raw base64 payloads. PDFs remain the only format sent as a raw upload, capped at 10MB. New converters under lib/completions: - DocToText shells out to antiword - DocxToText parses OOXML directly with size and depth limits - XlsToText shells out to xls2csv - XlsxToText parses OOXML and shared strings into CSV-style text - RtfToText is a custom RTF tokenizer with destination/group handling Plain text formats (csv, md, txt) are read with a 1MB byte cap and UTF-8 normalization. Extracted text is truncated to 100k characters, with a preamble noting the original filename and size. Dialect trimming now uses token-aware truncation against a per-message budget so large extracted documents collapse cleanly under the prompt limit, rather than the previous step-based slicing of raw content. Other changes: - LlmModel.normalize_attachment_types is shared with UploadEncoder and collapses "markdown" to "md" so the canonical extension is consistent across model config, UI defaults, and encoder output - ai-llm-attachment-types adds csv, xls, xlsx to the default choices - Locale strings clarify that vision controls images and allowed_attachment_types controls documents --------- Co-authored-by: Rafael Silva <xfalcox@gmail.com>		2026-05-05 08:16:23 +10:00
..
agents	FEATURE: Add category definition filters and AI filterTopics binding (#39478 )	2026-04-23 17:07:07 +10:00
automation	FEATURE: extract text from document uploads for LLM prompts (#39634 )	2026-05-05 08:16:23 +10:00
completions	FEATURE: extract text from document uploads for LLM prompts (#39634 )	2026-05-05 08:16:23 +10:00
discord/bot	DEV: AI persona to agent migration (#38319 )	2026-03-10 15:59:45 +11:00
discourse_automation	FIX: keep silent AI triage silent on credit exhaustion (#38738 )	2026-03-20 11:34:07 -03:00
inference	FEATURE: Add support for matryoshka in Gemini embeddings (#34145 )	2025-08-07 15:35:08 -03:00
inferred_concepts	DEV: AI persona to agent migration (#38319 )	2026-03-10 15:59:45 +11:00
mcp	FEATURE: Add advanced OAuth options for MCP servers (#38913 )	2026-04-01 08:47:23 +11:00
modules	FEATURE: extract text from document uploads for LLM prompts (#39634 )	2026-05-05 08:16:23 +10:00
translation	PERF: Lazy-load translation progress chart on admin AI translations page (#39458 )	2026-04-23 11:31:24 -07:00
utils	DEV: Restore/remove/fix/update AI-related specs (#39539 )	2026-04-27 00:20:33 +02:00
guardian_extensions_spec.rb	SECURITY: Force regeneration for edit-outdated summaries and block stale fallback	2026-03-31 15:12:45 +01:00

专为开源 Web 生态打造的企业级代码托管平台，深度支持 WordPress、Laravel、Vue.js、React 等主流技术栈，致力于推动中国开放网络 OpenWeb 发展，助力本土开源项目建设。

基于构建 | 专业 • 开放 • 安全

文派开源（WenPai.org）项目官方代码托管平台，由以下企业技术团队联合运营：

汉中菲比斯网络技术有限公司 | 文派（广州）科技有限公司

莫蒂奇数字技术（苏州）有限公司

探索项目组织机构问题反馈开发者社区

代码托管本地化翻译企业服务私有部署

文派叶子薇晓朵 WP TEA 慕得教育麟悦平台 ArkPress 跨飞独立站橙黑设计

Copyright © 2025 菲码源库 feiCode.com. All rights reserved. 陕ICP备15002899号-20