mirror of
https://gh.wpcy.net/https://github.com/discourse/discourse.git
synced 2026-06-19 04:03:45 +08:00
GitHub oneboxes and the discourse-github plugin talked to GitHub's REST
and
GraphQL API with no rate-limit awareness. On busy instances this
exhausted
GitHub's limits (60 requests/hour unauthenticated, 5000 authenticated),
and
because there was no backoff every render kept hitting GitHub and
re-failing
-- which GitHub's docs warn can get an integration banned. The recently
added PR-status onebox multiplied the number of calls and made it far
worse.
GitHub access was also fragmented: the core onebox engines used OpenURI,
the
discourse-github plugin used Octokit, and the discourse-ai bot tools
used
FinalDestination::HTTP -- three HTTP stacks, three tokens, and
inconsistent
(or entirely missing) error and rate-limit handling.
This introduces a single client, Discourse::GithubApi, that every GitHub
data-API request now flows through. It is built on Faraday with the
SSRF-safe
FinalDestination adapter and:
- authenticates per token (Bearer) and returns plain string-keyed Hashes
(get/post) or raw bodies (raw_get) -- one response shape, no
Octokit/Sawyer
- only ever sends the access token to api.github.com and
raw.githubusercontent.com, rejecting any other absolute URL, so a
user-derived path can never leak a token to an arbitrary host
- backs off on rate limits both reactively (403/429) and proactively
(when
X-RateLimit-Remaining hits 0), honouring Retry-After /
X-RateLimit-Reset,
via a shared Redis flag (GithubRateLimit) keyed per token so each
token's
budget and the shared unauthenticated/IP budget back off independently
- short-circuits while backing off without ever sleeping, so onebox
rendering
and post baking degrade to a plain link instead of blocking a request
- caches ETags and sends If-None-Match, so unchanged resources return
304s
that do not count against the rate limit
Every caller was moved onto it:
- the 6 core GitHub onebox engines, via a slimmed
Onebox::Mixins::GithubApi
adapter that keeps their public methods and translates client errors
back
to the OpenURI::HTTPError vocabulary they already rescue (engines
unchanged)
- the github_blob raw.githubusercontent.com fetch
- the discourse-github plugin (badges, linkback, permalinks, token
validator),
which no longer uses the octokit and sawyer gems (they stay in the
Gemfile for
the discourse-code-review official plugin, which still depends on them)
- the discourse-ai bot's GitHub tools (search code, diff, file content,
search files)
Also adds a GithubOneboxBackoff admin problem check that surfaces while
one of
the onebox token identities is backing off -- scoped to the tokens
resolved by
Onebox::GithubAccess (each configured github_onebox_access_tokens entry
plus the
unauthenticated client) so a backoff on the AI bot or linkback token is
not
misattributed to onebox. Its message points admins at the relevant
setting with
the {{setting:...}} link marker, which problem-check messages now expand
too.
Onebox token resolution is centralised in Onebox::GithubAccess, and the
onebox
cache TTL for transient GitHub failures is shortened so they recover
quickly.
GitHub OAuth login, theme git-clone, the inbound webhook, and the
Oneboxer
FinalDestination URL-resolution special-cases for github.com are
intentionally
out of scope -- they are different concerns, not the rate-limited data
API.
216 lines
6.6 KiB
Ruby
Vendored
216 lines
6.6 KiB
Ruby
Vendored
# frozen_string_literal: true
|
|
module DiscourseAi
|
|
module Agents
|
|
module Tools
|
|
class GithubFileContent < Tool
|
|
def self.signature
|
|
{
|
|
name: name,
|
|
description: "Retrieves the content of specified GitHub files",
|
|
parameters: [
|
|
{
|
|
name: "repo_name",
|
|
description: "The name of the GitHub repository (e.g., 'discourse/discourse')",
|
|
type: "string",
|
|
required: true,
|
|
},
|
|
{
|
|
name: "file_paths",
|
|
description:
|
|
"The file paths to retrieve. Append '#Lstart-Lend' (e.g., app/models/user.rb#L10-L25) to limit the returned lines",
|
|
type: "array",
|
|
item_type: "string",
|
|
required: true,
|
|
},
|
|
{
|
|
name: "branch",
|
|
description:
|
|
"The branch or commit SHA to retrieve the files from (default: 'main')",
|
|
type: "string",
|
|
required: false,
|
|
},
|
|
],
|
|
}
|
|
end
|
|
|
|
def self.name
|
|
"github_file_content"
|
|
end
|
|
|
|
def repo_name
|
|
parameters[:repo_name]
|
|
end
|
|
|
|
def file_paths
|
|
parameters[:file_paths]
|
|
end
|
|
|
|
def branch
|
|
parameters[:branch]
|
|
end
|
|
|
|
def description_args
|
|
paths = ""
|
|
paths = file_paths.join(", ") if file_paths.is_a?(Array)
|
|
{ repo_name: repo_name, file_paths: paths, branch: branch || default_branch }
|
|
end
|
|
|
|
def invoke
|
|
owner, repo = repo_name.to_s.split("/")
|
|
|
|
if owner.blank? || repo.blank?
|
|
return { error: "Invalid repo_name format. Expected 'owner/repo'." }
|
|
end
|
|
|
|
return { error: "file_paths cannot be empty." } if file_paths.blank?
|
|
return { error: "file_paths must be an array." } if !file_paths.is_a?(Array)
|
|
|
|
ref = branch || default_branch
|
|
retrieved_entries = []
|
|
missing_files = []
|
|
|
|
parsed_file_requests.each do |file_request|
|
|
file_path = file_request[:path]
|
|
api_url =
|
|
"https://api.github.com/repos/#{owner}/#{repo}/contents/#{file_path}?ref=#{ref}"
|
|
|
|
begin
|
|
file_data = github_client.get(api_url)
|
|
content = ensure_utf8(Base64.decode64(file_data["content"].to_s))
|
|
snippet =
|
|
extract_requested_content(
|
|
content,
|
|
file_request[:start_line],
|
|
file_request[:end_line],
|
|
)
|
|
retrieved_entries << {
|
|
path: file_path,
|
|
content: snippet,
|
|
start_line: file_request[:start_line],
|
|
end_line: file_request[:end_line],
|
|
}
|
|
rescue Discourse::GithubApi::Error
|
|
missing_files << file_request[:raw]
|
|
end
|
|
end
|
|
|
|
result = {}
|
|
unless retrieved_entries.empty?
|
|
blob =
|
|
retrieved_entries
|
|
.map do |entry|
|
|
label = "File Path: #{entry[:path]}"
|
|
if entry[:start_line]
|
|
label += " (lines #{format_line_range(entry[:start_line], entry[:end_line])})"
|
|
end
|
|
|
|
"#{label}:\n#{entry[:content]}"
|
|
end
|
|
.join("\n")
|
|
blob = ensure_utf8(blob)
|
|
truncated_blob = truncate(blob, max_length: 20_000, percent_length: 0.3, llm: llm)
|
|
result[:file_contents] = truncated_blob
|
|
end
|
|
|
|
result[:missing_files] = missing_files unless missing_files.empty?
|
|
|
|
result.empty? ? { error: "No files found or retrieved." } : result
|
|
end
|
|
|
|
private
|
|
|
|
def default_branch
|
|
@default_branch ||= fetch_default_branch(repo_name)
|
|
end
|
|
|
|
def parsed_file_requests
|
|
@parsed_file_requests ||=
|
|
file_paths.map do |raw|
|
|
start_line, end_line = extract_line_bounds(raw)
|
|
|
|
{
|
|
raw: raw,
|
|
path: raw.sub(line_fragment_regex, ""),
|
|
start_line: start_line,
|
|
end_line: end_line || start_line,
|
|
}
|
|
end
|
|
end
|
|
|
|
def extract_line_bounds(raw)
|
|
match = raw.match(line_fragment_regex)
|
|
return nil, nil unless match
|
|
|
|
start_line = positive_line_number(match[1])
|
|
end_line = positive_line_number(match[2])
|
|
end_line = start_line if start_line && end_line && end_line < start_line
|
|
|
|
[start_line, end_line]
|
|
end
|
|
|
|
def positive_line_number(value)
|
|
return if value.blank?
|
|
|
|
number = value.to_i
|
|
number.positive? ? number : nil
|
|
end
|
|
|
|
def line_fragment_regex
|
|
/#L(\d+)(?:-L?(\d+))?\z/i
|
|
end
|
|
|
|
MAX_LINES = 200
|
|
|
|
def extract_requested_content(content, start_line, end_line)
|
|
normalized = content.gsub("\r\n", "\n")
|
|
lines = normalized.split("\n")
|
|
total_lines = lines.length
|
|
|
|
if start_line.nil?
|
|
if total_lines <= MAX_LINES
|
|
return normalized
|
|
else
|
|
extracted = lines[0...MAX_LINES].join("\n")
|
|
return(
|
|
"#{extracted}\n\n[truncated — file has #{total_lines} lines, showing first #{MAX_LINES}. Use file_path#L#{MAX_LINES + 1}-L#{MAX_LINES + 200} to read more]"
|
|
)
|
|
end
|
|
end
|
|
|
|
if start_line > total_lines
|
|
return(
|
|
"Requested lines #{start_line}-#{end_line || start_line} exceed file length of #{total_lines}."
|
|
)
|
|
end
|
|
|
|
final_end_line = [end_line || start_line, total_lines].min
|
|
extracted = lines[(start_line - 1)..(final_end_line - 1)] || []
|
|
result = extracted.join("\n")
|
|
|
|
if final_end_line < total_lines
|
|
result +=
|
|
"\n\n[showing lines #{start_line}-#{final_end_line} of #{total_lines}. Use file_path#L#{final_end_line + 1}-L#{[final_end_line + 200, total_lines].min} to read more]"
|
|
end
|
|
|
|
result
|
|
end
|
|
|
|
def format_line_range(start_line, end_line)
|
|
return start_line.to_s if start_line == end_line || end_line.nil?
|
|
|
|
"#{start_line}-#{end_line}"
|
|
end
|
|
|
|
def ensure_utf8(text)
|
|
return "" if text.nil?
|
|
|
|
result = text.dup
|
|
result.force_encoding(Encoding::UTF_8)
|
|
return result if result.valid_encoding?
|
|
|
|
result.encode(Encoding::UTF_8, invalid: :replace, undef: :replace, replace: "")
|
|
end
|
|
end
|
|
end
|
|
end
|
|
end
|