mirror of https://gh.wpcy.net/https://github.com/discourse/discourse.git synced 2026-05-06 18:12:46 +08:00

History

Natalie Tay 81a1d0552b FIX: AI theme translation does not invalidate baked theme JS (#39761 ) When you click "Translate" in the theme component translation editor, the AI-translated strings get saved but the site keeps showing the old strings or missing-key placeholders like `[ja.js.theme_translations.<id>.foo]`. This PR switches the job to do individual `find_or_initialize_by` + `record.save!`, the same path the manual UI save uses via `ThemeTranslationManager#value=`, so the model's after_commit (which nulls value_baked and clears the theme cache) actually fires. Could batch and invalidate once at the end, but that means duplicating invalidation logic outside the model, which is the same drift that caused this bug. Doing the full upsert first then removing from cache could also cause the UI to be in a weird state while the whole thing finishes, so we're not doing that.		2026-05-06 09:50:55 +08:00
..
admin/assets/javascripts/discourse	PERF: Lazy-load translation progress chart on admin AI translations page (#39458 )	2026-04-23 11:31:24 -07:00
app	FIX: AI theme translation does not invalidate baked theme JS (#39761 )	2026-05-06 09:50:55 +08:00
assets	FEATURE: extract text from ODT and ODS document uploads (#39711 )	2026-05-05 12:04:13 -03:00
config	I18N: Update translations (#39747 )	2026-05-05 18:53:54 +02:00
db	FIX: Scope ReviewableAiToolAction to target post's topic and category (#39716 )	2026-05-05 10:24:19 -03:00
discourse_automation	FEATURE: gate temperature/top_p behind setting (#38479 )	2026-03-12 07:40:29 +11:00
evals	FEATURE: AI-generated DE queries on /new before creation (#39412 )	2026-04-22 13:50:26 +08:00
lib	FEATURE: extract text from ODT and ODS document uploads (#39711 )	2026-05-05 12:04:13 -03:00
public	DEV: Gate AI bot docked composer behind upcoming change (#39708 )	2026-05-04 09:26:32 -07:00
spec	FIX: AI theme translation does not invalidate baked theme JS (#39761 )	2026-05-06 09:50:55 +08:00
svg-icons
test/javascripts	FIX: Resolve broken uploads in AI bot docked composer (#39712 )	2026-05-04 11:01:32 -07:00
.prettierignore
about.json	DEV: Update discourse-ai's requiredPlugins (#37987 )	2026-02-23 17:15:09 +01:00
package.json	DEV: Add a script for generating external types in discourse-types (#37095 )	2026-03-09 20:37:43 +01:00
plugin.rb	FEATURE: Dock the composer for AI bot PM replies (#39336 )	2026-04-29 06:46:21 -07:00
README.md	DEV: AI persona to agent migration (#38319 )	2026-03-10 15:59:45 +11:00
tsconfig.json	DEV: Add a script for generating external types in discourse-types (#37095 )	2026-03-09 20:37:43 +01:00

README.md

Discourse AI Plugin

Plugin Summary

For more information, please see: https://meta.discourse.org/t/discourse-ai/259214?u=falco

Evals

The directory evals contains AI evals for the Discourse AI plugin. You may create a local config by copying config/eval-llms.yml to config/eval-llms.local.yml and modifying the values.

To run them use:

cd evals ./run --help

Usage: evals/run [options]
    -e, --eval NAME                  Name of the evaluation to run
    -m, --models NAME                Models to evaluate (comma separated, defaults to all)
    -l, --list                       List eval ids
        --list-models                List configured LLMs
        --list-features              List feature keys available to evals
        --list-agents              List agent definitions under evals/agents
    -f, --feature KEY                Filter evals by feature (module_name:feature_name)
    -j, --judge NAME                 LLM config used as a judge (defaults to gpt-4o when available)
        --agent-keys KEYS          Comma-separated list of agent keys (or repeat the flag) to run sequentially
        --compare MODE               Run comparisons (MODE: agents or llms)
        --dataset PATH               Path to a CSV dataset file (requires --feature)

To run evals you will need to configure API keys in your environment:

OPENAI_API_KEY=your_openai_api_key ANTHROPIC_API_KEY=your_anthropic_api_key GEMINI_API_KEY=your_gemini_api_key

Custom agents for evals

Eval runs can swap the built-in agents with YAML definitions stored in plugins/discourse-ai/evals/agents. Use --list-agents to discover available entries; the special key default always refers to the built-in agent prompt. Pass --agent-keys key1,key2 (or repeat --agent-keys key) to apply them:

./run --eval simple_summarization --models gpt-4o-mini --agent-keys topic_summary_eval,another_prompt

Each agent file only needs a system_prompt (and optional description). When specified, that prompt replaces the default system prompt of whichever agent the eval runner would normally use. Pass multiple keys (including default) to rerun the same evals with different prompts without restarting the CLI. Add new files under that directory to compare alternate prompts without touching the database.

When running agent comparisons (--compare agents) the CLI automatically prepends the built-in default agent so you can benchmark your YAML prompts against the stock behavior. Non-comparison runs still execute only the agents you list.

Dataset-driven evals

Supply --dataset path/to/file.csv along with --feature module:feature_name to generate eval cases from a CSV instead of YAML files. Each row must include content and expected_output columns; rows are converted into individual eval ids (prefixed with the dataset filename) that reuse the selected feature’s runner. Example:

./run --dataset evals/datasets/spam.csv --feature spam:inspect_posts --models gpt-4o-mini

Comparison matrix

Use the --compare flag to ask the CLI to judge multiple runs together:

--compare agents: require a single --models value and at least one agent key (the built-in default agent is implicitly added). Each eval is executed for every agent; the judge LLM scores them side-by-side and announces the winner plus individual ratings.
--compare llms: require at least two --models and exactly one agent (default unless you pass --agent-keys custom_agent). Every eval runs once and the judge compares the outputs from each LLM. Logs include the agent key (or default) so you can correlate recordings.

Both modes reuse the rubric declared under the eval’s judge block and stream the comparison summary to STDOUT. The structured log files continue to be written for each underlying run so you can drill into the raw outputs if the judge’s reasoning needs inspection.

README.md

Discourse AI Plugin

Evals

Custom agents for evals

Dataset-driven evals

Comparison matrix

菲码源库

快速导航

产品服务

生态伙伴

README.md Unescape Escape

Discourse AI Plugin

Evals

Custom agents for evals

Dataset-driven evals

Comparison matrix

菲码源库

快速导航

产品服务

生态伙伴

README.md