discourse/plugins/discourse-ai/lib/embeddings
Roman Rizzi a27e20c300
PERF: Speed up embedding text preparation. (#33791)
When collecting text for vectorizing a topic, we iterate over as many
posts as possible within the context window, parsing their cooked
attribute using Nokogiri. We noticed this method doesn't scale well when
working with larger contexts.

Instead, we'll collect as much unparsed cooked text as we can, then
parse it all in a single Nokogiri call.

I ran this a hundred times in a benchmark, and the perf gains are
significant:

```
user     system total        real
prepare_target_text:           114.887620   3.731693 118.619313 (118.952465)
prepare_target_text_bis:        10.264950   0.186204  10.451154 ( 10.465957)
```

Tried running it 1k times, but the old method took too long.
2025-07-23 13:52:48 -03:00
..
strategies PERF: Speed up embedding text preparation. (#33791) 2025-07-23 13:52:48 -03:00
entry_point.rb
schema.rb
semantic_related.rb
semantic_search.rb
semantic_topic_query.rb
vector.rb