Back to Blogannouncement

How We Built an AI News Aggregator with Self-Hosted BitNet

y0.exchange Team·February 28, 2026

The problem: too much noise, not enough signal

If you follow AI and crypto, you know the pain. Dozens of sources, hundreds of articles per day, and most of them are either duplicates, filler, or irrelevant. We wanted a feed that surfaces only what actually matters — without manual curation.

That's why we built news.y0.exchange: an AI-powered news intelligence platform that aggregates 40+ trusted sources and runs every article through a two-stage AI pipeline before it reaches you.

The architecture: two AI models, two jobs

The core idea is simple — use a fast, cheap model for filtering, and a powerful model for enrichment.

Stage 1: BitNet b1.58 (self-hosted)

We run BitNet b1.58 2B on our own server. This is a 1-bit quantized language model that Microsoft Research released — it's incredibly lightweight and runs inference in about 30 milliseconds per article on a modest CPU.

BitNet handles the first pass:

  • Relevance filtering — is this article actually about AI or crypto, or is it off-topic noise?
  • Sentiment detection — bullish, bearish, or neutral
  • Importance scoring — 0 to 10, how significant is this news?
  • Category classification — AI, crypto, DeFi, or AI-crypto crossover

The beauty of self-hosting BitNet is the economics. There are no API calls, no per-token costs, no rate limits. It processes the entire daily volume for free. We parse 40+ RSS feeds every 10 minutes — that's hundreds of articles per day — and BitNet handles all of them without breaking a sweat.

Stage 2: Claude Sonnet (API)

Articles that score importance >= 4 in Stage 1 get promoted to Stage 2, where Claude Sonnet generates:

  • A concise AI summary of the article
  • Key takeaways as bullet points
  • Additional tags and metadata
  • An actionability flag — is this something you should act on?

This two-stage approach keeps API costs minimal. BitNet filters out ~60% of articles as low-importance or irrelevant, so Claude only processes the ones that matter. At roughly $0.005 per article, Stage 2 costs us a few dollars per month.

Smart deduplication: three layers deep

One of the biggest problems with news aggregation is duplicate content. The same story gets published by 10 outlets with slightly different titles. We handle this with three levels of deduplication:

  1. URL hash — SHA256 of the canonical URL catches exact duplicates
  2. Title trigram similarity — catches rewrites within a 48-hour window
  3. BitNet semantic check — catches articles that tell the same story with completely different wording

This means you never see the same news twice, even when multiple sources cover it.

Why self-hosted BitNet was the key decision

We considered using a single cloud API for everything, but the math didn't work. Processing 500+ articles per day through a cloud LLM would cost $50-100/month and introduce latency and rate-limit concerns.

BitNet b1.58 changed the equation:

  • Cost: $0. It runs on the same VPS that hosts the rest of the stack
  • Latency: ~30ms. Faster than any API round-trip
  • Privacy: Article content never leaves our infrastructure
  • Reliability: No dependency on external API uptime for the critical filtering stage
  • No GPU required. BitNet's 1-bit quantization runs efficiently on CPU

The model is small enough that it doesn't compete for resources with MongoDB, Redis, or the NestJS backend. It just sits there, processing articles as they arrive through the Bull queue.

The delivery layer

Filtered and enriched articles are served through multiple channels:

  • Webnews.y0.exchange with segment-based feeds (AI Pulse, Crypto Signal), full-text search, trending topics, and ticker tracking
  • Telegram — a daily digest at 8:00 AM UTC with the top 15 articles, sentiment indicators, and direct links
  • Email — daily and weekly newsletters via SendPulse

The stack

  • Backend: NestJS + TypeScript, MongoDB, Redis, Bull queues
  • Frontend: Next.js 14 (App Router), Tailwind CSS, SWR
  • AI: Self-hosted BitNet b1.58 2B + Claude Sonnet API
  • Analytics: Self-hosted Umami
  • Infrastructure: Docker on Hetzner

What's next

We're working on real-time price correlation — when a news article mentions a token, we want to track how the price moves in the hours after publication. The goal is to eventually surface patterns: which sources and which types of news actually predict market movement.

Check it out at news.y0.exchange.