discourse/plugins/discourse-ai/lib/agents/tools/db_schema.rb
Natalie Tay fbcbdc46d8
FIX: Data explorer agent reliability for schema and plurality (#40152)
Reported through a few user tests, the agent was unreliable in three
ways:
- `tags.tag_name` instead of `tags.name`
- did not use `current_user_id` for "my posts" prompts
- plural nouns as singular
- and used unparse-able date defaults like "today".

Few issues:
- `DbSchema` tool was returning a dense one-line-per-table comma that
qwen was unable to deal with. Now line-per-column so schema accuracy
originally flaky is now 5/5 PASSING on qwen and Gemini.
- The prompt was teaching the wrong thing where the `-- null boolean
:opt_flag = #null` example made models use `#null` as a default value.
We now have a "Parameter rules" section, ISO date examples that match
the "no natural-language defaults" rule below them, explicit
`current_user_id` guidance for first-person prompts, and a plural-noun
rule that applies to each plural noun independently in the same prompt
(e.g. "categories and tags" → BOTH list params, not one of each).
- Eval runner now captures `name` and `description` separately, not just
`sql`. The description text is graded directly rather than grading the
SQL string.

Tested against qwen 3.5 122B (our hosted model) + Gemini 3.1 Flash Lite
(judge GPT-5.2): 20/20 each. New eval cases ship in this PR
https://github.com/discourse/discourse-ai-evals/pull/18
2026-05-19 16:23:17 +08:00

106 lines
3 KiB
Ruby
Vendored

# frozen_string_literal: true
module DiscourseAi
module Agents
module Tools
class DbSchema < Tool
def self.signature
{
name: name,
description: "Will load schema information for specific tables in the database",
parameters: [
{
name: "tables",
description:
"list of tables to load schema information for, comma separated list eg: (users,posts))",
type: "string",
required: true,
},
],
}
end
def self.name
"schema"
end
def tables
parameters[:tables]
end
def invoke
tables_arr = tables.split(",").map(&:strip).reject(&:empty?)
return { schema_info: "", tables: tables } if tables_arr.empty?
rows = DB.query(<<~SQL, tables_arr)
select table_name, column_name, data_type, is_nullable, character_maximum_length
from information_schema.columns
where table_schema = 'public'
and table_name in (?)
order by table_name, ordinal_position
SQL
by_table = rows.group_by(&:table_name)
missing_tables = tables_arr - by_table.keys
sections =
tables_arr.filter_map do |table_name|
cols = by_table[table_name]
next unless cols
lines =
cols.map do |row|
attrs = [simplify_type(row.data_type, row.character_maximum_length)]
attrs << "PK" if row.column_name == "id"
if (fkey = fkey_target(table_name, row.column_name))
attrs << "FK → #{fkey}"
end
attrs << "null" if row.is_nullable == "YES"
" #{row.column_name} #{attrs.join(", ")}"
end
"TABLE #{table_name}\n#{lines.join("\n")}"
end
sections << "TABLES NOT FOUND: #{missing_tables.join(", ")}" if missing_tables.any?
{ schema_info: sections.join("\n\n"), tables: tables }
end
protected
def description_args
{ tables: tables }
end
private
def simplify_type(type, max_len)
case type
when "character varying"
max_len ? "varchar(#{max_len})" : "varchar"
when "timestamp without time zone"
"timestamp"
when "double precision"
"double"
else
type
end
end
def fkey_target(table_name, column_name)
return nil if column_name == "id"
return nil unless column_name.end_with?("_id")
if defined?(DiscourseDataExplorer::DataExplorer) &&
DiscourseDataExplorer::DataExplorer.respond_to?(:fkey_info)
target = DiscourseDataExplorer::DataExplorer.fkey_info(table_name, column_name)
return target.to_s if target
end
nil
end
end
end
end
end