mirror of
https://gh.wpcy.net/https://github.com/discourse/discourse.git
synced 2026-06-19 04:03:45 +08:00
Reported through a few user tests, the agent was unreliable in three ways: - `tags.tag_name` instead of `tags.name` - did not use `current_user_id` for "my posts" prompts - plural nouns as singular - and used unparse-able date defaults like "today". Few issues: - `DbSchema` tool was returning a dense one-line-per-table comma that qwen was unable to deal with. Now line-per-column so schema accuracy originally flaky is now 5/5 PASSING on qwen and Gemini. - The prompt was teaching the wrong thing where the `-- null boolean :opt_flag = #null` example made models use `#null` as a default value. We now have a "Parameter rules" section, ISO date examples that match the "no natural-language defaults" rule below them, explicit `current_user_id` guidance for first-person prompts, and a plural-noun rule that applies to each plural noun independently in the same prompt (e.g. "categories and tags" → BOTH list params, not one of each). - Eval runner now captures `name` and `description` separately, not just `sql`. The description text is graded directly rather than grading the SQL string. Tested against qwen 3.5 122B (our hosted model) + Gemini 3.1 Flash Lite (judge GPT-5.2): 20/20 each. New eval cases ship in this PR https://github.com/discourse/discourse-ai-evals/pull/18
106 lines
3 KiB
Ruby
Vendored
106 lines
3 KiB
Ruby
Vendored
# frozen_string_literal: true
|
|
|
|
module DiscourseAi
|
|
module Agents
|
|
module Tools
|
|
class DbSchema < Tool
|
|
def self.signature
|
|
{
|
|
name: name,
|
|
description: "Will load schema information for specific tables in the database",
|
|
parameters: [
|
|
{
|
|
name: "tables",
|
|
description:
|
|
"list of tables to load schema information for, comma separated list eg: (users,posts))",
|
|
type: "string",
|
|
required: true,
|
|
},
|
|
],
|
|
}
|
|
end
|
|
|
|
def self.name
|
|
"schema"
|
|
end
|
|
|
|
def tables
|
|
parameters[:tables]
|
|
end
|
|
|
|
def invoke
|
|
tables_arr = tables.split(",").map(&:strip).reject(&:empty?)
|
|
return { schema_info: "", tables: tables } if tables_arr.empty?
|
|
|
|
rows = DB.query(<<~SQL, tables_arr)
|
|
select table_name, column_name, data_type, is_nullable, character_maximum_length
|
|
from information_schema.columns
|
|
where table_schema = 'public'
|
|
and table_name in (?)
|
|
order by table_name, ordinal_position
|
|
SQL
|
|
|
|
by_table = rows.group_by(&:table_name)
|
|
missing_tables = tables_arr - by_table.keys
|
|
|
|
sections =
|
|
tables_arr.filter_map do |table_name|
|
|
cols = by_table[table_name]
|
|
next unless cols
|
|
|
|
lines =
|
|
cols.map do |row|
|
|
attrs = [simplify_type(row.data_type, row.character_maximum_length)]
|
|
attrs << "PK" if row.column_name == "id"
|
|
if (fkey = fkey_target(table_name, row.column_name))
|
|
attrs << "FK → #{fkey}"
|
|
end
|
|
attrs << "null" if row.is_nullable == "YES"
|
|
" #{row.column_name} #{attrs.join(", ")}"
|
|
end
|
|
|
|
"TABLE #{table_name}\n#{lines.join("\n")}"
|
|
end
|
|
|
|
sections << "TABLES NOT FOUND: #{missing_tables.join(", ")}" if missing_tables.any?
|
|
|
|
{ schema_info: sections.join("\n\n"), tables: tables }
|
|
end
|
|
|
|
protected
|
|
|
|
def description_args
|
|
{ tables: tables }
|
|
end
|
|
|
|
private
|
|
|
|
def simplify_type(type, max_len)
|
|
case type
|
|
when "character varying"
|
|
max_len ? "varchar(#{max_len})" : "varchar"
|
|
when "timestamp without time zone"
|
|
"timestamp"
|
|
when "double precision"
|
|
"double"
|
|
else
|
|
type
|
|
end
|
|
end
|
|
|
|
def fkey_target(table_name, column_name)
|
|
return nil if column_name == "id"
|
|
return nil unless column_name.end_with?("_id")
|
|
|
|
if defined?(DiscourseDataExplorer::DataExplorer) &&
|
|
DiscourseDataExplorer::DataExplorer.respond_to?(:fkey_info)
|
|
target = DiscourseDataExplorer::DataExplorer.fkey_info(table_name, column_name)
|
|
return target.to_s if target
|
|
end
|
|
|
|
nil
|
|
end
|
|
end
|
|
end
|
|
end
|
|
end
|