1
0
mirror of synced 2026-05-22 19:03:14 +00:00

1454 Commits

Author SHA1 Message Date
Jarek Radosz c312b3184a DEV: Mention the core move in readme (#1514) 2025-07-22 11:08:32 +01:00
Jarek Radosz f231aad8b5 DEV: Disable the plugin by default (#1511)
…and preserve the current setting on existing sites
2025-07-22 11:05:52 +01:00
Alan Guo Xiang Tan cc77e73cfd PERF: Create index ON ai_topics_embeddings(topic_id, model_id) (#1513)
This index helps to speed up queries that joins the `topics` table
against the `ai_topics_embeddings` table on the `topic_id` column. There
are a number of queries which filters on `ai_topics_embeddings.model_id`
so we are including that in the index as well.
2025-07-22 16:46:56 +08:00
Natalie Tay 1d88ca44f0 DEV: Use core's locale normalizer (#1509)
We've moved the class to core here discourse/discourse#33702 so following up a deletion in this repo.
2025-07-22 10:09:03 +08:00
Roman Rizzi 6059b6e111 FIX: Customizable max_output_tokens for AI triage. (#1510)
We enforced a hard limit of 700 tokens in this script, which is not enough when using thinking models, which can quickly use all of them.

A temporary solution could be bumping the limit, but there is no guarantee we won't hit it again, and it's hard to find one value that fits all scenarios. Another alternative could be removing it and relying on the LLM config's `max_output_token`, but if you want different rules and want to assign different limits, you are forced to duplicate the config each time.

Considering all this, we are adding a dedicated field for this in the triage script, giving you an easy way to tweak it to your needs. If empty, no limit is applied.
2025-07-21 15:36:39 -03:00
Natalie Tay 5d80a34589 FIX: Add a max token limit based on the text to be translated (#1507)
We're seeing that some LLMs are using 65000+ tokens for raw text that is only 10-1000 characters long.

This PR adds a max_token to be passed to the LLM API for each translation based on the length of the text.
2025-07-17 17:47:15 +08:00
Natalie Tay 8630bc145e DEV: Set a min and max for translations backfill (#1508)
Since we use the value for days.ago and are seeing PG::DatetimeFieldOverflow: ERROR:  timestamp out of range: "5473790-07-13 08:43:28.497823 BC, set limits to the site setting.
2025-07-17 17:43:05 +08:00
Natalie Tay ad6a8cb812 DEV: Switch translations out of structured output as it returns only a single value (#1503)
Since translations only require a single key back, there is little point in using structured output. This PR also includes some prompt updates dealing with quotes, details, and code.

Related: #1502

This does mean reverting discourse/discourse-translator#257, but we can see how it goes.
2025-07-16 13:38:00 +08:00
Roman Rizzi 06743d1939 FIX: Fix embeddings to use the old OpenAI tokenizer (#1506) 2025-07-15 14:44:11 -03:00
Kris 67664029e5 FIX: prevent crash in "all" filter on features page (#1505)
* FIX: prevent crash in "all" filter on features page

* simplify
2025-07-15 12:51:46 -04:00
Sam 1b16fc876c FIX: avoid using structured outputs for report runs (#1502)
Structured outputs are prone to formatting issue, especially around newlines
and custom pieces of text that need escaping.

This avoids using it for the automation reporting.

Particularly previous to this fix o4-mini based reports were broken
2025-07-15 13:14:52 +10:00
Discourse Translator Bot 8c069490f0 Update translations (#1498) 2025-07-15 07:28:15 +10:00
Roman Rizzi 6a4dbd8126 FEATURE: Use Personas in Automation's llm_report script (#1499) 2025-07-14 12:47:21 -03:00
Roman Rizzi 89bcf9b1f0 FIX: Process succesfully generated embeddings even if some failed (#1500) 2025-07-10 17:51:01 -03:00
Martin Brennan 6b5ea38644 DEV: Improve AI bot conversation submit upload handling (#1497)
Try fix a flaky spec in /ai_bot/homepage_spec.rb by using ember
data rather than inspecting the DOM directly to see if there
are any in-progress uploads.

Also add missing translation for in progress uploads warning.
2025-07-10 17:06:39 +10:00
Natalie Tay d54cd1f602 DEV: Normalize locales that are similar (e.g. en and en_GB) so they do not get translated (#1495)
This commit
- normalizes locales like en_GB and variants to en. With this, the feature will not translate en_GB posts to en (or similarly pt_BR to pt_PT)
- consolidates whether the feature is enabled in `DiscourseAi::Translation.enabled?`
- similarly for backfill in  `DiscourseAi::Translation.backfill_enabled?`
  - turns off backfill if `ai_translation_backfill_max_age_days` is 0 to keep true to what it says. Set it to a high number to backfill everything
2025-07-09 22:21:51 +08:00
Keegan George cf7288e1bf DEV: Surface topic_id in errors of summarization backfill (#1493)
This update ensures that the `topic_id` related to the error when summarizing is surfaced in the logs, which should help track down the reason for the errors.
2025-07-09 06:39:36 -07:00
Discourse Translator Bot 027d7f1199 Update translations (#1492) 2025-07-09 15:27:19 +02:00
Keegan George 625442af3c FIX: title suggestions should return 5 unique titles (#1491)
This update fixes a regression from https://github.com/discourse/discourse-ai/pull/1484, which caused AI helper title suggestions to begin suggesting numerous non-unique titles because it was looping through structured responses incorrectly.
2025-07-08 06:30:09 -07:00
Natalie Tay 699ea3f501 DEV: Add backtrace to logs (#1489) 2025-07-08 10:39:43 +08:00
Natalie Tay 56f025cf44 FIX: Localize description excerpts as they have limits (#1490) 2025-07-08 10:36:41 +08:00
Rafael dos Santos Silva 7357280e88 FEATURE: Add old OpenAI tokenizer to embeddings (#1487) 2025-07-07 15:07:27 -03:00
Natalie Tay 6f8960e549 FIX: Pass topic to context (#1488) 2025-07-07 14:59:48 +08:00
Rafael dos Santos Silva 6247906c13 FEATURE: Seamless embedding model upgrades (#1486) 2025-07-04 16:44:03 -03:00
Sam ab5edae121 FIX: make AI helper more robust (#1484)
* FIX: make AI helper more robust

- If JSON is broken for structured output then lean on a more forgiving parser
- Gemini 2.5 flash does not support temp, support opting out
- Evals for assistant were broken, fix interface
- Add some missing LLMs
- Translator was not mapped correctly to the feature - fix that
- Don't mix XML in prompt for translator

* lint

* correct logic

* simplify code

* implement best effort json parsing direct in the structured output object
2025-07-04 14:47:11 +10:00
Joffrey JAFFEUX 0f904977a4 FIX: correctly translate automation name (#1485)
* FIX: correctly translation automation name

The key is persona, not name.

* simplify
2025-07-04 13:01:45 +10:00
Natalie Tay 2b9a4f9232 FIX: Ignore captions and quotes when detecting locale and update prompts (#1483)
A more deterministic way of making sure the LLM detects the correct language (instead of relying on prompt to LLM to ignore it) is to take the cooked and remove unwanted elements.

In this commit 
- we remove quotes, image captions, etc. and only take the remaining text, falling back to the unadulterated cooked
- and update prompts related to detection and translation
- /152465/12
2025-07-03 22:57:48 +08:00
moin-Jana 8b4f401a7b FIX: Typo in custom_tool_exists text (#1480) 2025-07-03 10:17:31 -03:00
Discourse Translator Bot 92e3615378 Update translations (#1479) 2025-07-02 22:36:46 +02:00
Rafael dos Santos Silva d792919ddf DEV: Move tokenizers to a gem (#1481)
Also renames the Mixtral tokenizer to Mistral.

See gem at github.com/discourse/discourse_ai-tokenizers


Co-authored-by: Roman Rizzi <roman@discourse.org>
2025-07-02 14:43:03 -03:00
Roman Rizzi 75fb37144f FEATURE: Use personas for generating hypothetical posts (#1482)
* FEATURE: Use personas for generating hypothetica posts

* Update prompt
2025-07-02 10:56:38 -03:00
Sam 40fa527633 FIX: cross talk when in ai helper (#1478)
Previous to this change we reused channels for proofreading progress and
ai helper progress

The new changeset ensures each POST to stream progress gets a dedicated
message bus channel

This fixes a class of issues where the wrong information could be displayed
to end users on subsequent proofreading or helper calls

* fix tests

* fix implementation (got to subscribe at 0)
2025-07-01 18:02:16 +10:00
moin-Jana 897f31e564 UX: Fix typo in bot.description text (#1474)
Introducing a typo isn't the right way to bypass the check that blocks the term "private messages".

Nice try, though ;)

I changed it to "personal messages".
2025-07-01 17:24:05 +10:00
Yuriy Kurant 8527279594 dev: removes messages section from sidebar (#1477) 2025-07-01 17:23:11 +10:00
Kris 4ad64ed3b6 DEV: replace sortBy with toSorted (#1476) 2025-06-30 16:41:59 -04:00
Roman Rizzi 5ca7d5f256 FIX: Strip uploads from msg when searching for rag fragments (#1475) 2025-06-30 15:03:17 -03:00
Natalie Tay a94daa14e2 FIX: Return no topics when embeddings is disabled (#1473)
When an invalid model is set for embeddings, topics do not load even if embeddings is disabled.

Error:
## RuntimeError in TopicsController#show
Invalid embeddings selected model

This commit checks for valid settings before attempting to load related topics.
2025-06-30 17:45:04 +08:00
Kris 262bd8b145 UX: add filter to features page, update styles (#1471)
* UX: add filter to features page, update styles

* merge fix

* update toggle spec

* test fix
2025-06-30 09:26:53 +10:00
Roman Rizzi 57b00526f8 FIX: Clarify spam response expectations. (#1470) 2025-06-27 16:59:55 -03:00
Roman Rizzi 8d943fa29d FEATURE: Display spam module on features list. (#1469) 2025-06-27 14:18:01 -03:00
Roman Rizzi b35f9bcc7c FEATURE: Use Persona's when scanning posts for spam (#1465) 2025-06-27 10:35:47 -03:00
Sam cc4e9e030f FIX: normalize keys in structured output (#1468)
* FIX: normalize keys in structured output

Previously we did not validate the hash passed in to structured
outputs which could either be string based or symbol base

Specifically this broke structured outputs for Gemini in some
specific cases.

* comment out flake
2025-06-27 15:42:48 +10:00
Sam 73768ce920 FEATURE: Display bot in feature list (#1466)
- allows features to have multiple llms and multiple personas
- sorts module list
- adds Bot as a first class module
- fixes issue where search module was always configured
- some tests
2025-06-27 12:35:41 +10:00
Rafael dos Santos Silva a40e2d3156 FEATURE: Update OpenAI tokenizer to GPT-4o and later (#1467) 2025-06-26 15:26:09 -03:00
Kris 2fe99a0bec UX: add missing translation for uploads (#1464) 2025-06-25 11:36:00 -04:00
Sam 3e74f09d06 FEATURE: improve custom tool infra (#1463)
- Add support for `chain.streamCustomRaw(test)` that can be used to stream text from a JS tool direct to composer
- Add support for llm params in `llm.generate` which unlocks stuff like structured outputs
- Add discourse.createStagedUser, discourse.createTopic  and discourse.createPost - for content creation
2025-06-25 16:25:44 +10:00
Discourse Translator Bot 3cfc749fad Update translations (#1462) 2025-06-24 16:29:23 +02:00
Jarek Radosz 5735f063a3 FIX: A typo in bot filtration in ai-bot-header-icon (#1455)
* FIX: A typo in bot filtration in ai-bot-header-icon

* FIX: Show header icon when there's only one persona with a default LLM set

---------

Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>
2025-06-24 10:51:07 -03:00
Kris 757c93e514 UX: make topic list gists link to the topic (#1459) 2025-06-24 09:14:18 -04:00
Natalie Tay 4c1cd5d819 UX: Align llm button in ai features (#1461) 2025-06-24 17:23:58 +08:00