mirror of
https://github.com/discourse/discourse.git
synced 2026-03-03 23:54:20 +08:00
## Overview
This PR introduces comprehensive search functionality for chat messages,
enabling users to search through their chat history both globally across
all accessible channels and within specific channels.
### Search Capabilities
**All-Channel Search**: When no channel is specified, users can search
across all channels they have access to. The search respects channel
permissions through `ChannelFetcher.all_secured_channel_ids`, ensuring
users only see results from channels they can view.
**Per-Channel Search**: Users can scope their search to a specific
channel by providing a `channel_id` parameter, useful for finding
messages within a particular conversation context.
**Search Features**:
- Full-text search using PostgreSQL's tsvector/tsquery
- Advanced filters: `@username` to filter by author, `#channel` to
filter by channel slug
- Sort options: relevance (default) or latest
- Pagination support
- Search data weighted by relevance
## Site Setting: `chat_search_enabled`
This feature is gated behind the `chat_search_enabled` site setting,
which is currently:
- **Default**: `false`
- **Hidden**: `true`
- **Client-accessible**: `true`
### Deployment Strategy
Due to the need for chat messages to be indexed before search becomes
useful, we're implementing a two-phase deployment:
**Phase 1 (Initial Merge)**:
- `chat_search_enabled` remains `false` and hidden
- The `register_search_index` uses default (true) instead of `chat_search_enabled` value
- This allows the reindexing infrastructure to begin indexing existing
chat messages even if we don't show the UI yet
**Wait Period**:
- Wait at least one week after Phase 1 deployment
- `Jobs::ReindexSearch` runs every 2 hours and will progressively index
all chat messages
- This ensures most sites have a significant part of their chat history indexed
**Phase 2 (Follow-up Merge)**:
- Set `chat_search_enabled` default to `true` and unhide it
- Update the `register_search_index` enabled proc uses the default
(true) instead of using the `chat_search_enabled` setting
- Users can now access search with pre-indexed data
**Rationale**: Without this phased approach, users would see the search
UI immediately but receive no results until the reindexing job runs,
creating a confusing experience. By pre-indexing while the UI is hidden,
we ensure search works immediately when enabled.
## New Plugin API: `register_search_index`
This PR introduces a new plugin API that allows plugins to register
custom search indexes that integrate seamlessly with Discourse's search
infrastructure.
### API Signature
```ruby
register_search_index(
model_class:, # The ActiveRecord model to index
search_data_class:, # The model for storing search data
index_version:, # Version number for re-indexing
search_data:, # Proc that returns weighted search data
load_unindexed_record_ids:,# Proc that finds records needing indexing
enabled: # Optional proc to enable/disable (default: -> { true })
)
```
### How It Works
**Integration with SearchIndexer**: When `SearchIndexer.index(obj)` is
called, it checks registered search handlers for the object's type. If a
handler matches, it:
1. Calls the `search_data` proc with the object and an `IndexerHelper`
instance
2. Receives weighted search data (`:a_weight`, `:b_weight`, `:c_weight`,
`:d_weight`)
3. Updates the corresponding search data table with PostgreSQL's
tsvector
**Integration with Jobs::ReindexSearch**: The scheduled job (runs every
2 hours) calls `rebuild_registered_search_handlers`, which:
1. Iterates through all registered search handlers
2. Skips handlers where `enabled` proc returns `false`
3. Calls `load_unindexed_record_ids` to find records needing indexing
4. Indexes up to `limit` records per handler (default: 10,000)
### Chat Implementation Example
```ruby
register_search_index(
model_class: Chat::Message,
search_data_class: Chat::MessageSearchData,
index_version: 1,
search_data: proc { |message, indexer_helper|
{
a_weight: message.message,
d_weight: indexer_helper.scrub_html(message.cooked)[0..600_000]
}
},
load_unindexed_record_ids: proc { |limit:, index_version:|
Chat::Message
.joins("LEFT JOIN chat_message_search_data ON chat_message_id = chat_messages.id")
.where(
"chat_message_search_data.locale IS NULL OR
chat_message_search_data.locale != ? OR
chat_message_search_data.version != ?",
SiteSetting.default_locale,
index_version
)
.order("chat_messages.id ASC")
.limit(limit)
.pluck(:id)
}
)
```
Co-authored-by: Martin Brennan <mjrbrennan@gmail.com>
Co-authored-by: Loïc Guitaut <5648+Flink@users.noreply.github.com>
235 lines
6.4 KiB
Ruby
235 lines
6.4 KiB
Ruby
# frozen_string_literal: true
|
|
|
|
RSpec.describe Jobs::ReindexSearch do
|
|
subject(:job) { described_class.new }
|
|
|
|
let(:locale) { "fr" }
|
|
|
|
before do
|
|
SearchIndexer.enable
|
|
Jobs.run_immediately!
|
|
end
|
|
|
|
# This works since test db has a small record less than limit.
|
|
# Didn't check `topic` because topic doesn't have posts in fabrication
|
|
# thus no search data
|
|
%w[post category user].each do |m|
|
|
it "should rebuild `#{m}` when default_locale changed" do
|
|
SiteSetting.default_locale = "en"
|
|
model = Fabricate(m.to_sym)
|
|
SiteSetting.default_locale = locale
|
|
job.execute({})
|
|
expect(model.public_send("#{m}_search_data").locale).to eq locale
|
|
end
|
|
|
|
it "should rebuild `#{m}` when INDEX_VERSION changed" do
|
|
model = Fabricate(m.to_sym)
|
|
# so that search data can be reindexed
|
|
search_data = model.public_send("#{m}_search_data")
|
|
search_data.update!(version: 0)
|
|
model.reload
|
|
|
|
job.execute({})
|
|
expect(model.public_send("#{m}_search_data").version).to eq(
|
|
"SearchIndexer::#{m.upcase}_INDEX_VERSION".constantize,
|
|
)
|
|
end
|
|
end
|
|
|
|
describe "rebuild_posts" do
|
|
class FakeIndexer
|
|
def self.index(post, force:)
|
|
get_posts.push(post)
|
|
end
|
|
|
|
def self.posts
|
|
get_posts
|
|
end
|
|
|
|
def self.reset
|
|
get_posts.clear
|
|
end
|
|
|
|
private
|
|
|
|
def self.get_posts
|
|
@posts ||= []
|
|
end
|
|
end
|
|
|
|
after { FakeIndexer.reset }
|
|
|
|
it "should not reindex posts that belong to a deleted topic or have been trashed" do
|
|
post = Fabricate(:post)
|
|
post2 = Fabricate(:post)
|
|
post3 = Fabricate(:post)
|
|
PostSearchData.delete_all
|
|
post2.topic.trash!
|
|
post3.trash!
|
|
|
|
job.rebuild_posts(indexer: FakeIndexer)
|
|
|
|
expect(FakeIndexer.posts).to contain_exactly(post)
|
|
end
|
|
|
|
it "should not reindex posts with a developmental version" do
|
|
Fabricate(:post, version: SearchIndexer::POST_INDEX_VERSION + 1)
|
|
|
|
job.rebuild_posts(indexer: FakeIndexer)
|
|
|
|
expect(FakeIndexer.posts).to eq([])
|
|
end
|
|
|
|
it "should not reindex posts with empty raw" do
|
|
post = Fabricate(:post)
|
|
post.post_search_data.destroy!
|
|
|
|
post2 = Fabricate.build(:post, raw: "", post_type: Post.types[:small_action])
|
|
|
|
post2.save!(validate: false)
|
|
|
|
job.rebuild_posts(indexer: FakeIndexer)
|
|
|
|
expect(FakeIndexer.posts).to contain_exactly(post)
|
|
end
|
|
end
|
|
|
|
describe "rebuild_registered_search_handlers" do
|
|
let(:plugin) { Plugin::Instance.new }
|
|
let(:mock_model_class) do
|
|
Class.new do
|
|
def self.name
|
|
"TestModel"
|
|
end
|
|
|
|
def self.table_name
|
|
"test_models"
|
|
end
|
|
|
|
def self.find_by(id:)
|
|
return nil unless id == 1
|
|
new(id: id)
|
|
end
|
|
|
|
def initialize(id:)
|
|
@id = id
|
|
end
|
|
|
|
attr_reader :id
|
|
end
|
|
end
|
|
|
|
let(:mock_search_data_class) { double }
|
|
|
|
after { DiscoursePluginRegistry.reset! }
|
|
|
|
it "rebuilds records for registered search handlers" do
|
|
indexer = mock("SearchIndexer")
|
|
indexer.expects(:index).once
|
|
|
|
plugin.register_search_index(
|
|
model_class: mock_model_class,
|
|
search_data_class: mock_search_data_class,
|
|
index_version: 1,
|
|
search_data: ->(record, indexer_helper) { { a_weight: "test" } },
|
|
load_unindexed_record_ids: ->(**_args) { [1] },
|
|
)
|
|
|
|
job.rebuild_registered_search_handlers(indexer: indexer)
|
|
end
|
|
|
|
it "skips disabled handlers" do
|
|
indexer = mock("SearchIndexer")
|
|
indexer.expects(:index).never
|
|
|
|
plugin.register_search_index(
|
|
model_class: mock_model_class,
|
|
search_data_class: mock_search_data_class,
|
|
index_version: 1,
|
|
search_data: ->(record, indexer_helper) { { a_weight: "test" } },
|
|
load_unindexed_record_ids: ->(**_args) { [1] },
|
|
enabled: -> { false },
|
|
)
|
|
|
|
job.rebuild_registered_search_handlers(indexer: indexer)
|
|
end
|
|
|
|
it "passes limit and index_version to load_unindexed_record_ids" do
|
|
indexer = mock("SearchIndexer")
|
|
received_args = nil
|
|
|
|
plugin.register_search_index(
|
|
model_class: mock_model_class,
|
|
search_data_class: mock_search_data_class,
|
|
index_version: 2,
|
|
search_data: ->(record, indexer_helper) { { a_weight: "test" } },
|
|
load_unindexed_record_ids:
|
|
lambda do |**args|
|
|
received_args = args
|
|
[]
|
|
end,
|
|
)
|
|
|
|
job.rebuild_registered_search_handlers(limit: 5000, indexer: indexer)
|
|
|
|
expect(received_args[:limit]).to eq(5000)
|
|
expect(received_args[:index_version]).to eq(2)
|
|
end
|
|
|
|
it "handles missing records gracefully" do
|
|
indexer = mock("SearchIndexer")
|
|
indexer.expects(:index).never
|
|
|
|
plugin.register_search_index(
|
|
model_class: mock_model_class,
|
|
search_data_class: mock_search_data_class,
|
|
index_version: 1,
|
|
search_data: ->(record, indexer_helper) { { a_weight: "test" } },
|
|
load_unindexed_record_ids: ->(**_args) { [999] },
|
|
)
|
|
|
|
expect { job.rebuild_registered_search_handlers(indexer: indexer) }.not_to raise_error
|
|
end
|
|
end
|
|
|
|
describe "#execute" do
|
|
it "should clean up topic_search_data of trashed topics" do
|
|
topic = Fabricate(:post).topic
|
|
topic2 = Fabricate(:post).topic
|
|
|
|
[topic, topic2].each { |t| SearchIndexer.index(t, force: true) }
|
|
|
|
freeze_time(1.day.ago) { topic.trash! }
|
|
|
|
expect { job.execute({}) }.to change { TopicSearchData.count }.by(-1)
|
|
expect(Topic.pluck(:id)).to contain_exactly(topic2.id)
|
|
|
|
expect(TopicSearchData.pluck(:topic_id)).to contain_exactly(topic2.topic_search_data.topic_id)
|
|
end
|
|
|
|
it "should clean up post_search_data of posts with empty raw or posts from trashed topics" do
|
|
post = Fabricate(:post)
|
|
post2 = Fabricate(:post, post_type: Post.types[:small_action])
|
|
post2.raw = ""
|
|
post2.save!(validate: false)
|
|
post3 = Fabricate(:post)
|
|
post3.topic.trash!
|
|
post4, post5, post6 = nil
|
|
|
|
freeze_time(1.day.ago) do
|
|
post4 = Fabricate(:post)
|
|
post4.topic.trash!
|
|
|
|
post5 = Fabricate(:post)
|
|
post6 = Fabricate(:post, topic_id: post5.topic_id)
|
|
post6.trash!
|
|
end
|
|
|
|
expect { job.execute({}) }.to change { PostSearchData.count }.by(-3)
|
|
|
|
expect(Post.pluck(:id)).to contain_exactly(post.id, post2.id, post3.id, post4.id, post5.id)
|
|
|
|
expect(PostSearchData.pluck(:post_id)).to contain_exactly(post.id, post3.id, post5.id)
|
|
end
|
|
end
|
|
end
|