Skip to content
// content groups

Only the stories that matter

Raw source feeds produce noise. Content groups let you define precisely which stories pass through your pipeline — using keyword rules, deduplication, and real-time relevance scoring to surface only high-value content.

Keyword rules

Define keyword groups that match articles by topic. Each content group supports three types of keyword configuration:

Positive keywords

Articles must contain at least one of these terms to be considered. Use them to define your topic focus — "AI regulation", "semiconductor", "central bank policy".

Negative keywords

Articles containing any of these terms are excluded. Filter out irrelevant subtopics, competitors, or content categories you don't cover.

Relevance thresholds

Set minimum relevance scores that articles must meet. Combine keyword match strength with trends data to ensure only high-relevance content proceeds to rewriting.

Keywords
Similarity
Threshold

Deduplication

When multiple sources cover the same story, you don't want five versions of the same article in your pipeline. Newsmill uses cosine-similarity deduplication with a 70% threshold over a 24-hour rolling window.

Every article is converted into a vector embedding using OpenAI text-embedding-3-small. Incoming articles are compared against recent content in your pipeline. If similarity exceeds 70%, the article is flagged as a near-duplicate and skipped. This catches not just identical articles but rewrites, syndicated copies, and minor variations of the same story.

Relevance scoring

Keyword matching alone isn't enough. Newsmill integrates with Google Trends to score each article's keywords for real-time relevance. An article about a topic that's trending right now scores higher than one about a topic with flat interest.

The final relevance score combines keyword match strength with trends data. You set the threshold — only articles that exceed it move forward to rewriting and publishing. This ensures your output stays focused on what your audience cares about right now, not just what matches a keyword list.

This feature is included on every paid plan. See plans and pricing →

Ready to get started?

Sign up free and start filtering content in minutes.