mirror of
https://gh.wpcy.net/https://github.com/discourse/discourse.git
synced 2026-06-18 22:48:32 +08:00
Previously, a converter `ProgressStep` was a single instance that spanned the process boundary: `items` ran in the main process while `process_item` ran in forked workers. Instance variables behaved differently depending on where they were set — state set in `items` never reached a worker, state set in `execute` was only a copy-on-write snapshot, and state mutated in `process_item` accumulated per worker and never came back. It was easy to write a step that looked correct but lost data silently, and with the planned thread-based worker for JRuby the same code would turn into a data race. This change replaces the `items` / `process_item` contract with two separate role objects, defined with a small DSL: a `source` that enumerates the items in the main process (`items`, `max_progress`) and a `processor` that handles one item at a time in a worker, with a `setup` hook that runs once per worker after the fork. `helpers` mixes pure functions into both roles. Because the roles are separate objects, source state is structurally out of reach for the processor — a cross-boundary call raises a `NameError` instead of reading state that never crossed the fork. The worker lifecycle is now an explicit `setup` -> `run` -> `cleanup`, so a future `ThreadWorker` on JRuby can call the same hooks at thread start without changes to this code. Also in this change: - `Step` and `ProgressStep` are siblings below a minimal `StepBase` now; the progress step itself is only a coordinator without per-step state. - Writes to the IntermediateDB during processor `setup` raise at the write site (`SetupGuard`), in serial and parallel mode alike. Before, such writes were silently discarded in parallel mode. - A crashed worker process raises `Worker::CrashedError` in the parent instead of silently dropping the items that were queued for it. - `ParallelJob` no longer registers a `ForkManager.after_fork_child` callback; `Worker` calls `job.setup` at the top of its fork block instead, and the unused `after_fork_child` API and `run_once` option were removed from `ForkManager`. - The `SiteSettings` step queried the source DB in `execute` and kept the result in an instance variable for `process_item`, which only worked because of the fork snapshot. The uploads lookup is now part of the `items` query. Moving it also surfaced a latent bug: the `::int` casts relied on `AND`/`OR` short-circuiting, but PostgreSQL doesn't guarantee the evaluation order — with a production-sized `uploads` table the planner builds index probes from the casts and the query fails on the first non-numeric setting value. The casts are now guarded with `CASE`. - The MRI performance path is untouched: same Oj/pipe IPC, same single DB writer thread, same fork batching. New specs cover the role isolation, the worker lifecycle, and serial/parallel parity (the same fixture step runs in both modes and has to produce identical inserts), and the gem spec suites verify partial doubles now. The IntermediateDB output is unchanged — running the converter against the same source database before and after this change produces identical contents.
59 lines
2.3 KiB
Ruby
Vendored
59 lines
2.3 KiB
Ruby
Vendored
# frozen_string_literal: true
|
|
|
|
require "fileutils"
|
|
require "tempfile"
|
|
require "tmpdir"
|
|
|
|
# Shared bootstrap for the four migration gems' spec suites. Each gem's
|
|
# spec_helper delegates here so the Rails boot, i18n, mocking, and support
|
|
# wiring live in one place.
|
|
module MigrationsSpecSetup
|
|
CORE_SPEC_DIR = __dir__
|
|
|
|
# @param gem [String] the gem entrypoint to require (e.g. "migrations-core")
|
|
# @param spec_dir [String] the calling gem's spec directory (__dir__)
|
|
def self.call(gem:, spec_dir:)
|
|
boot_rails(spec_dir) if ENV["MIGRATIONS_RAILS"]
|
|
|
|
require gem
|
|
Migrations.enable_i18n
|
|
Migrations.apply_global_config
|
|
|
|
require "rspec-multi-mock"
|
|
|
|
# Shared support (matchers, helpers, shared examples) lives in the core gem;
|
|
# also load the calling gem's own support, if any.
|
|
load_support(CORE_SPEC_DIR)
|
|
load_support(spec_dir) unless spec_dir == CORE_SPEC_DIR
|
|
|
|
RSpec.configure do |config|
|
|
config.mock_with MultiMock::Adapter.for(:rspec, :mocha)
|
|
|
|
# Partial stubs on real objects must name a method that actually exists,
|
|
# just like `instance_double`/`class_double` already enforce. Set on the
|
|
# global rspec-mocks config because `mock_with` receives the MultiMock
|
|
# adapter here, not the `:rspec` adapter that normally exposes this
|
|
# setting. The gem suites are mocha-free; Discourse core's own suite
|
|
# never loads this file.
|
|
RSpec::Mocks.configuration.verify_partial_doubles = true
|
|
|
|
# Specs tagged `:rails` need a booted Rails environment (live DB
|
|
# introspection, plugin manifests). They run in the Rails integration job,
|
|
# not the isolated gem suite.
|
|
config.filter_run_excluding(:rails) unless ENV["MIGRATIONS_RAILS"]
|
|
end
|
|
end
|
|
|
|
# Discourse resolves some autoload-ignore paths (e.g. `lib/freedom_patches` in
|
|
# config/initializers/000-zeitwerk.rb) relative to the working directory, so
|
|
# boot the host harness with the cwd at the application root — the same way the
|
|
# `disco` binary does before loading the Rails environment.
|
|
def self.boot_rails(spec_dir)
|
|
rails_root = File.expand_path("../../..", spec_dir)
|
|
Dir.chdir(rails_root) { require File.join(rails_root, "spec", "rails_helper") }
|
|
end
|
|
|
|
def self.load_support(dir)
|
|
Dir[File.join(dir, "support", "**", "*.rb")].sort.each { |file| require file }
|
|
end
|
|
end
|