wp-update-server-plugin/README.md
David Stone 68f632fda0
feat: add site discovery to scrape subsites and network health from discovered domains (#10)
Implements issue #4. Adds three new classes:

- Site_Discovery_Table: wp_wu_site_discovery DB table with upsert, pending
  queue, health score distribution, and summary stats queries.
- Site_Discovery_Scraper: daily WP-Cron scraper (batch of 20) that fetches
  each domain's homepage, checks robots.txt, detects SSL, production status,
  checkout pages, subsite count, network type, and UM version. Computes a
  0-100 health score from those signals.
- Site_Discovery_Admin: sub-page under Telemetry showing key questions
  answered (production networks with 10+ subsites, networks with checkout),
  health score distribution, signal summary, and a sortable domain table.
  Includes a manual 'Run Scraper Now' button.

Also updates README with feature documentation and privacy/ethics notes.
2026-03-27 13:16:53 -06:00

48 lines
1.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## WP Update Server Plugin
This project creates an update server to enable automatic updates for WordPress plugins which are Woocommerce downloadable products.
## Requires
* Woocommerce
* [WP OAuth Server](https://wordpress.org/plugins/oauth2-provider/)
## Features
### Telemetry Dashboard
Opt-in telemetry from Ultimate Multisite installations. Tracks PHP/WP/plugin versions, network types, active addons, and error reports.
### Site Discovery (Network Health)
Background scraper that discovers network health signals from domains found via passive install tracking. Runs daily via WP-Cron and processes up to 20 domains per batch.
**What it detects:**
- Whether the site is live (HTTP 2xx)
- Whether it is a production domain (not staging/dev/local)
- SSL/HTTPS availability
- Presence of a checkout or registration page (UM shortcodes/blocks)
- Estimated subsite count (subdirectory path probing + HTML signals)
- Network type: subdomain vs subdirectory
- Ultimate Multisite version (from asset URLs or meta tags)
**Health score (0100):**
| Signal | Points |
|---|---|
| Site is live | +20 |
| Production domain | +15 |
| Has SSL | +10 |
| Has checkout page | +15 |
| Detected subsites > 5 | +20 |
| Detected subsites > 50 | +10 |
| UM version fingerprinted | +10 |
**Privacy and ethics:**
- Only scrapes publicly accessible pages (homepage, `/register/`)
- Respects `robots.txt` — skips sites that disallow `UltimateMultisiteBot`
- User-Agent: `UltimateMultisiteBot/1.0 (+https://ultimatemultisite.com/bot)`
- 10-second timeout per request; unresponsive sites are marked `failed` and retried next run
- Stores only aggregate signals, not page content
**Dashboard answers:**
- How many production networks have 10+ subsites?
- How many networks have a checkout page (are selling to customers)?
- Distribution of health scores across all discovered domains