discourse/lib/discourse_dev/browser_pageview_event.rb
Alan Guo Xiang Tan 437ab337d2
FEATURE: Add top countries and top referrers cards to the admin dashboard (#40215)
This commit adds two new cards to the redesigned admin dashboard's Site
Traffic section: top countries and top referrers, both sourced from
`browser_pageview_events`.

Key technical decisions:

1. Gate the cards on the `persist_browser_pageview_events` site setting.
The cards have no data source unless browser pageview events are being
persisted, so they are omitted from the dashboard.

2. Normalize referrers at write time. A new `normalized_referrer` column
on `browser_pageview_events` is populated by
`BrowserPageviewReferrerInspector`, which strips scheme, `www.`, port,
fragment, trailing slashes, and common tracking query params. Doing this
at insert time avoids per-row string operations at query time.

3. Count browser pageviews by country and by referrer in two new report
concerns. `Reports::TopCountriesByBrowserPageviews` groups by
`country_code` and `Reports::TopReferrersByBrowserPageviews` groups by
`normalized_referrer`. Both compute share of total browser pageviews and
rank the top 5 in SQL. The country report drops MaxMind reserved codes
(unknown, anonymous proxy, satellite). The referrer report drops
same-host referrals. Both also exclude anonymous browser pageviews
(`user_id IS NULL`) when the `login_required` site setting is enabled,
since only logged-in browser pageviews are meaningful on a closed forum.

4. Fetch each report through the existing dashboard service.
`AdminDashboardSiteTraffic#build` returns one entry per card with a `{
rows:, error: }` shape, e.g.:

   ```ruby
   {
     top_countries: {
       rows: [
         { country_code: "US", count: 142, percent: 35 },
         { country_code: "GB", count: 89, percent: 22 }
       ],
       error: nil
     },
     top_referrers: {
       rows: [
{ normalized_referrer: "news.ycombinator.com/item?id=1", count: 47,
percent: 12 },
{ normalized_referrer: "reddit.com/r/discourse", count: 31, percent: 8 }
       ],
       error: nil
     }
   }
   ```

On report failure, `rows: []` and `error: :timeout` (or another symbol).
This lets the UI render rows, error, or empty state independently.
Healthy responses are cached via `Report.find_cached`.
`SiteSetting.login_required` and `Discourse.current_hostname` flow into
`opts[:filters]` so toggling either invalidates the cache. Timeouts skip
the cache so the next request retries.

5. Use `Intl.DisplayNames` for country names instead of locale files.
`Intl.DisplayNames` is a built-in browser API that returns a localized
country name for an ISO 3166-1 alpha-2 code, avoiding ~250 translation
strings per locale.
2026-05-22 12:59:16 +08:00

99 lines
2.6 KiB
Ruby
Vendored

# frozen_string_literal: true
require "discourse_dev"
module DiscourseDev
class BrowserPageviewEvent
DEFAULT_COUNT = 1500
DEFAULT_RANGE = 3.months
COUNTRY_WEIGHTS = {
"US" => 40,
"GB" => 15,
"DE" => 10,
"FR" => 8,
"CA" => 8,
"AU" => 5,
"BR" => 5,
"JP" => 4,
"IN" => 3,
"CN" => 2,
nil => 5,
}.freeze
REFERRERS = [
"news.ycombinator.com/item?id=42",
"news.ycombinator.com/item?id=99",
"news.ycombinator.com",
"reddit.com/r/discourse",
"reddit.com/r/programming",
"twitter.com/discourse",
"google.com",
"github.com/discourse/discourse",
"facebook.com",
"m.facebook.com",
nil,
nil,
nil,
].freeze
def initialize(count: DEFAULT_COUNT)
@count = count
end
def populate!
unless Discourse.allow_dev_populate?
raise 'To run this rake task in a production site, set the value of `ALLOW_DEV_POPULATE` environment variable to "1"'
end
SiteSetting.persist_browser_pageview_events = true
rows = build_rows
::BrowserPageviewEvent.insert_all(rows)
mirror_to_application_requests(rows)
puts "Enabled persist_browser_pageview_events and inserted #{rows.size} events."
rows.size
end
def self.populate!(count: nil)
new(count: count || DEFAULT_COUNT).populate!
end
private
attr_reader :count
def build_rows
country_pool = COUNTRY_WEIGHTS.flat_map { |code, weight| [code] * weight }
user_ids = ::User.real.limit(20).pluck(:id)
user_id_pool = user_ids + ([nil] * [user_ids.size / 4, 1].max)
Array.new(count) do
normalized = REFERRERS.sample
{
url: "https://forum.example.com/t/sample-topic/#{rand(1000)}",
ip_address: "192.0.2.#{rand(1..254)}",
user_agent: "Mozilla/5.0 (X11; Linux x86_64) Chrome/123",
session_id: SecureRandom.hex(16),
country_code: country_pool.sample,
normalized_referrer: normalized,
referrer: normalized ? "https://#{normalized}" : nil,
user_id: user_id_pool.sample,
created_at: DEFAULT_RANGE.ago + rand(DEFAULT_RANGE.to_i).seconds,
}
end
end
def mirror_to_application_requests(rows)
rows
.group_by do |row|
req_type = row[:user_id] ? :page_view_logged_in_browser : :page_view_anon_browser
[row[:created_at].to_date, req_type]
end
.each do |(date, req_type), grouped|
::ApplicationRequest.write_cache!(req_type, grouped.size, date)
end
end
end
end