discourse/plugins/discourse-ai/spec/evals at main - Discourse/discourse - 菲码源库 feiCode.com

Discourse/discourse

mirror of https://gh.wpcy.net/https://github.com/discourse/discourse.git synced 2026-06-19 02:05:37 +08:00

History

Natalie Tay fbcbdc46d8 FIX: Data explorer agent reliability for schema and plurality (#40152 ) Reported through a few user tests, the agent was unreliable in three ways: - `tags.tag_name` instead of `tags.name` - did not use `current_user_id` for "my posts" prompts - plural nouns as singular - and used unparse-able date defaults like "today". Few issues: - `DbSchema` tool was returning a dense one-line-per-table comma that qwen was unable to deal with. Now line-per-column so schema accuracy originally flaky is now 5/5 PASSING on qwen and Gemini. - The prompt was teaching the wrong thing where the `-- null boolean :opt_flag = #null` example made models use `#null` as a default value. We now have a "Parameter rules" section, ISO date examples that match the "no natural-language defaults" rule below them, explicit `current_user_id` guidance for first-person prompts, and a plural-noun rule that applies to each plural noun independently in the same prompt (e.g. "categories and tags" → BOTH list params, not one of each). - Eval runner now captures `name` and `description` separately, not just `sql`. The description text is graded directly rather than grading the SQL string. Tested against qwen 3.5 122B (our hosted model) + Gemini 3.1 Flash Lite (judge GPT-5.2): 20/20 each. New eval cases ship in this PR https://github.com/discourse/discourse-ai-evals/pull/18		2026-05-19 16:23:17 +08:00
..
runners	FIX: Data explorer agent reliability for schema and plurality (#40152 )	2026-05-19 16:23:17 +08:00
support	DEV: AI persona to agent migration (#38319 )	2026-03-10 15:59:45 +11:00
agent_prompt_loader_spec.rb	DEV: AI persona to agent migration (#38319 )	2026-03-10 15:59:45 +11:00
console_formatter_spec.rb	DEV: AI persona to agent migration (#38319 )	2026-03-10 15:59:45 +11:00
eval_spec.rb	FEATURE: Run eval comparisons against a dataset (#36223 )	2025-11-28 14:37:55 -03:00
features_spec.rb	DEV: AI persona to agent migration (#38319 )	2026-03-10 15:59:45 +11:00
judge_spec.rb	FEATURE: add agentic execution mode for AI personas (#38230 )	2026-03-05 15:06:54 +11:00
llm_repository_spec.rb
recorder_spec.rb	DEV: AI persona to agent migration (#38319 )	2026-03-10 15:59:45 +11:00
workbench_compare_spec.rb	DEV: AI persona to agent migration (#38319 )	2026-03-10 15:59:45 +11:00
workbench_spec.rb	FIX: Data explorer agent reliability for schema and plurality (#40152 )	2026-05-19 16:23:17 +08:00

专为开源 Web 生态打造的企业级代码托管平台，深度支持 WordPress、Laravel、Vue.js、React 等主流技术栈，致力于推动中国开放网络 OpenWeb 发展，助力本土开源项目建设。

基于构建 | 专业 • 开放 • 安全

文派开源（WenPai.org）项目官方代码托管平台，由以下企业技术团队联合运营：

汉中菲比斯网络技术有限公司 | 文派（广州）科技有限公司

莫蒂奇数字技术（苏州）有限公司

探索项目组织机构问题反馈开发者社区

代码托管本地化翻译企业服务私有部署

文派叶子薇晓朵 WP TEA 慕得教育麟悦平台 ArkPress 跨飞独立站橙黑设计

Copyright © 2025 菲码源库 feiCode.com. All rights reserved. 陕ICP备15002899号-20