discourse/migrations/core/spec/spec_setup.rb
Gerhard Schlager 420bced604
MT: Split converter steps into source and processor roles (#40816)
Previously, a converter `ProgressStep` was a single instance that
spanned the process boundary: `items` ran in the main process while
`process_item` ran in forked workers. Instance variables behaved
differently depending on where they were set — state set in `items`
never reached a worker, state set in `execute` was only a copy-on-write
snapshot, and state mutated in `process_item` accumulated per worker
and never came back. It was easy to write a step that looked correct
but lost data silently, and with the planned thread-based worker for
JRuby the same code would turn into a data race.

This change replaces the `items` / `process_item` contract with two
separate role objects, defined with a small DSL: a `source` that
enumerates the items in the main process (`items`, `max_progress`) and
a `processor` that handles one item at a time in a worker, with a
`setup` hook that runs once per worker after the fork. `helpers` mixes
pure functions into both roles. Because the roles are separate objects,
source state is structurally out of reach for the processor — a
cross-boundary call raises a `NameError` instead of reading state that
never crossed the fork. The worker lifecycle is now an explicit
`setup` -> `run` -> `cleanup`, so a future `ThreadWorker` on JRuby can
call the same hooks at thread start without changes to this code.

Also in this change:

- `Step` and `ProgressStep` are siblings below a minimal `StepBase`
  now; the progress step itself is only a coordinator without per-step
  state.
- Writes to the IntermediateDB during processor `setup` raise at the
  write site (`SetupGuard`), in serial and parallel mode alike. Before,
  such writes were silently discarded in parallel mode.
- A crashed worker process raises `Worker::CrashedError` in the parent
  instead of silently dropping the items that were queued for it.
- `ParallelJob` no longer registers a `ForkManager.after_fork_child`
  callback; `Worker` calls `job.setup` at the top of its fork block
  instead, and the unused `after_fork_child` API and `run_once` option
  were removed from `ForkManager`.
- The `SiteSettings` step queried the source DB in `execute` and kept
  the result in an instance variable for `process_item`, which only
  worked because of the fork snapshot. The uploads lookup is now part
  of the `items` query. Moving it also surfaced a latent bug: the
  `::int` casts relied on `AND`/`OR` short-circuiting, but PostgreSQL
  doesn't guarantee the evaluation order — with a production-sized
  `uploads` table the planner builds index probes from the casts and
  the query fails on the first non-numeric setting value. The casts are
  now guarded with `CASE`.
- The MRI performance path is untouched: same Oj/pipe IPC, same single
  DB writer thread, same fork batching.

New specs cover the role isolation, the worker lifecycle, and
serial/parallel parity (the same fixture step runs in both modes and
has to produce identical inserts), and the gem spec suites verify
partial doubles now. The IntermediateDB output is unchanged — running
the converter against the same source database before and after this
change produces identical contents.
2026-06-12 20:54:31 +02:00

59 lines
2.3 KiB
Ruby
Vendored

# frozen_string_literal: true
require "fileutils"
require "tempfile"
require "tmpdir"
# Shared bootstrap for the four migration gems' spec suites. Each gem's
# spec_helper delegates here so the Rails boot, i18n, mocking, and support
# wiring live in one place.
module MigrationsSpecSetup
CORE_SPEC_DIR = __dir__
# @param gem [String] the gem entrypoint to require (e.g. "migrations-core")
# @param spec_dir [String] the calling gem's spec directory (__dir__)
def self.call(gem:, spec_dir:)
boot_rails(spec_dir) if ENV["MIGRATIONS_RAILS"]
require gem
Migrations.enable_i18n
Migrations.apply_global_config
require "rspec-multi-mock"
# Shared support (matchers, helpers, shared examples) lives in the core gem;
# also load the calling gem's own support, if any.
load_support(CORE_SPEC_DIR)
load_support(spec_dir) unless spec_dir == CORE_SPEC_DIR
RSpec.configure do |config|
config.mock_with MultiMock::Adapter.for(:rspec, :mocha)
# Partial stubs on real objects must name a method that actually exists,
# just like `instance_double`/`class_double` already enforce. Set on the
# global rspec-mocks config because `mock_with` receives the MultiMock
# adapter here, not the `:rspec` adapter that normally exposes this
# setting. The gem suites are mocha-free; Discourse core's own suite
# never loads this file.
RSpec::Mocks.configuration.verify_partial_doubles = true
# Specs tagged `:rails` need a booted Rails environment (live DB
# introspection, plugin manifests). They run in the Rails integration job,
# not the isolated gem suite.
config.filter_run_excluding(:rails) unless ENV["MIGRATIONS_RAILS"]
end
end
# Discourse resolves some autoload-ignore paths (e.g. `lib/freedom_patches` in
# config/initializers/000-zeitwerk.rb) relative to the working directory, so
# boot the host harness with the cwd at the application root — the same way the
# `disco` binary does before loading the Rails environment.
def self.boot_rails(spec_dir)
rails_root = File.expand_path("../../..", spec_dir)
Dir.chdir(rails_root) { require File.join(rails_root, "spec", "rails_helper") }
end
def self.load_support(dir)
Dir[File.join(dir, "support", "**", "*.rb")].sort.each { |file| require file }
end
end