discourse/plugins/discourse-ai/spec/evals
Natalie Tay fbcbdc46d8
FIX: Data explorer agent reliability for schema and plurality (#40152)
Reported through a few user tests, the agent was unreliable in three
ways:
- `tags.tag_name` instead of `tags.name`
- did not use `current_user_id` for "my posts" prompts
- plural nouns as singular
- and used unparse-able date defaults like "today".

Few issues:
- `DbSchema` tool was returning a dense one-line-per-table comma that
qwen was unable to deal with. Now line-per-column so schema accuracy
originally flaky is now 5/5 PASSING on qwen and Gemini.
- The prompt was teaching the wrong thing where the `-- null boolean
:opt_flag = #null` example made models use `#null` as a default value.
We now have a "Parameter rules" section, ISO date examples that match
the "no natural-language defaults" rule below them, explicit
`current_user_id` guidance for first-person prompts, and a plural-noun
rule that applies to each plural noun independently in the same prompt
(e.g. "categories and tags" → BOTH list params, not one of each).
- Eval runner now captures `name` and `description` separately, not just
`sql`. The description text is graded directly rather than grading the
SQL string.

Tested against qwen 3.5 122B (our hosted model) + Gemini 3.1 Flash Lite
(judge GPT-5.2): 20/20 each. New eval cases ship in this PR
https://github.com/discourse/discourse-ai-evals/pull/18
2026-05-19 16:23:17 +08:00
..
runners FIX: Data explorer agent reliability for schema and plurality (#40152) 2026-05-19 16:23:17 +08:00
support DEV: AI persona to agent migration (#38319) 2026-03-10 15:59:45 +11:00
agent_prompt_loader_spec.rb DEV: AI persona to agent migration (#38319) 2026-03-10 15:59:45 +11:00
console_formatter_spec.rb DEV: AI persona to agent migration (#38319) 2026-03-10 15:59:45 +11:00
eval_spec.rb FEATURE: Run eval comparisons against a dataset (#36223) 2025-11-28 14:37:55 -03:00
features_spec.rb DEV: AI persona to agent migration (#38319) 2026-03-10 15:59:45 +11:00
judge_spec.rb FEATURE: add agentic execution mode for AI personas (#38230) 2026-03-05 15:06:54 +11:00
llm_repository_spec.rb
recorder_spec.rb DEV: AI persona to agent migration (#38319) 2026-03-10 15:59:45 +11:00
workbench_compare_spec.rb DEV: AI persona to agent migration (#38319) 2026-03-10 15:59:45 +11:00
workbench_spec.rb FIX: Data explorer agent reliability for schema and plurality (#40152) 2026-05-19 16:23:17 +08:00