Catch Them Once, Block Everywhere: Telegram Anti-Spam

[ 01 / 10 ]The problem: one message behind

Run a crypto Telegram group with any traction and you meet the whole cast. The fake admin who DMs your newest members minutes after they join. The wallet-drainer link parked in a bio. The "support" account whose display name matches your moderator's, except one letter is a Cyrillic lookalike. The shill accounts that arrive in waves whenever a chart moves.

Standard moderation loses to all of them by design, because it is reactive. The spammer posts, someone reports, an admin wakes up and bans, the spammer rejoins on the next account from the farm. Every loop through that cycle starts with the damage already done. The ban is a receipt, not a defense.

So I built the defense to run before the first message instead of after it: verify every account at the door, remember every catch, and share that memory across every community the bot protects. It is live in front of 10+ communities right now, completely free. The bot is t.me/GTO_AntiSpam_Bot; the product page covers setup.

The stack is deliberately boring: Vercel serverless functions run the bot, Turso (libSQL) holds the data, Telethon reaches the parts of Telegram the Bot API pretends not to have, and a Telegram Mini App fronts the verification flow. Serverless dictated the architecture - every invocation starts from nothing, so all state lives in the database, and the Telethon StringSession is cached at module level so a warm function skips re-authentication. That constraint turned out to be the product: a bot whose entire memory is a database is a bot whose memory can be shared.

[ 02 / 10 ]The verification gate: six layers before the door

A new member cannot post anything. They get a verification prompt, and behind it a gate that runs six checks in sequence:

Global blocklist. The account is cross-referenced against every flag any protected community has ever raised. Known spammers end here, instantly.

GeoIP. The Mini App captures the visitor's IP server-side and resolves the country through ip-api.com. Presets can block or allow countries outright.

Phone registration country. MTProto's PeerSettings exposes where the account's phone number was registered. The IP says where the VPN exits; the phone says where the account was born.

Display-name regex. "Admin", "official", "support" and their Unicode-lookalike spellings - the standard impersonation kit.

Bio scan. Shill keywords, t.me invite links, and the emoji patterns spam accounts decorate themselves with.

Telegram's own signals. The report_spam flag other users have already raised, plus recent name and photo changes from the same PeerSettings call. An aged account freshly re-skinned is a tell.

The first failed check ends the conversation: instant ban, with the full metadata haul - names, bio, both countries, account age, which layer fired - written to the blocklist. Passing all six earns a single-use invite link created with member_limit=1, so a verified invite cannot be shared, resold, or replayed by the next account in the farm.

[ 03 / 10 ]The preset system: a shared immune system

Communities disagree about what spam is. One group's banned keyword is another group's ticker; a country filter that protects a trading chat would strangle a global open-source project. So every rule above is configuration, not code. Each community runs on a preset: a JSON config with toggleable checks, country block and allow lists, the display-name regex, the bio keyword, emoji and link lists, and the inactivity threshold.

The line that matters: communities sharing a preset share a blocklist. Ban a spammer in one community and they are already banned in every sibling community, including the ones they have not gotten around to visiting yet. Presets are isolated from each other - an aggressive meme-coin config never leaks its rules into a quieter project - but inside a preset, every catch protects everyone.

[ The blocklist is a shared immune system. ]

One community's infection becomes every community's antibody. That is the entire product in a sentence; everything else is plumbing to make the sentence true.

[ 04 / 10 ]The database: one table doing the work

The plumbing is one table that earns its keep:

[ turso (libsql) - the table everything else orbits ]

CREATE TABLE IF NOT EXISTS preset_blocklist (
    id          INTEGER PRIMARY KEY AUTOINCREMENT,
    preset_name TEXT    NOT NULL,
    user_id     INTEGER NOT NULL,
    username    TEXT,
    reason      TEXT,
    audited     INTEGER DEFAULT 0,   -- 1 = flagged by an external scan
    user_data   TEXT,                -- full JSON snapshot at flag time
    created_at  TEXT    DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(preset_name, user_id)
);

The UNIQUE(preset_name, user_id) constraint makes INSERT OR IGNORE safe, so every path that flags accounts - the live gate, the cleanup passes, the external scans - can write blindly without checking what already exists. Idempotent writes, zero dedup logic.

The audited flag separates a live catch at the verification gate from an account flagged by scanning, because those are different grades of evidence and the analytics treat them differently. user_data is the full JSON snapshot of everything known at flag time. It looked like hoarding when I added it; by the end of this post it is a threat-intelligence dataset.

Inactive members go to a separate inactive_removals table and never touch the blocklist. Inactivity is not malice, and a blocklist that cannot tell the difference stops being evidence.

[ 05 / 10 ]/clean: four passes over the sediment

The gate protects a community from now on. Existing groups carry years of sediment, so /clean is the admin command that re-litigates the whole membership in four passes, ordered by cost:

Pass 1 - quick checks. Blocklist hits and config-rule matches via O(1) set lookups. No API calls spent.
Pass 2 - bio scan. Telethon fetches full profiles with GetFullUserRequest and runs the same keyword, link and emoji rules as the gate.
Pass 3 - restricted users. Accounts Telegram itself has already limited.
Pass 4 - ban-list import. Pulls the group's existing ban list in through alphabetic search iteration - the same trick that breaks a much bigger ceiling below.

[ 06 / 10 ]/audit: scanning groups I do not admin

/clean needs admin rights. /audit does not. Point it at any public group and it scans up to 10,000 members, runs every check, and reports what is living there. I run it on communities before the bot ever protects them - and the 10,000 in that sentence gets fixed two sections from now.

It runs locally instead of on Vercel for a blunt reason: a Vercel function dies at 60 seconds and a real scan takes hours. Locally it can authenticate multiple real Telegram accounts through Telethon and walk the member list with steady pacing.

The war story is the entity cache. Telethon will accept a user ID for GetFullUserRequest and then fail anyway, because MTProto only lets a session reference users it has already seen. You must walk the group with GetParticipantsRequest first - populating the session's entity cache - and only then fetch bios. No documentation says this. No error message hints at it. It cost hours, and the fix is one request issued in the right order.

Rate limits turned out to be about rhythm, not volume. A steady 1.0 request per second per account has produced zero flood waits across 20,000+ bio fetches. Burst above roughly 1.2 per second and the penalties cost more than the speed bought. Telegram does not punish you for asking a lot; it punishes you for asking rudely.

[ 07 / 10 ]Scaling: throughput is a head count

One authenticated account sustains about 0.8 requests per second, and the arithmetic of a serious group gets ugly fast. The fix is horizontal: split the member list across parallel authenticated accounts.

[ time to bio-scan 10,000 members ]

~3.5 h 1 account, ~0.8 requests/s baseline

~1.7 h 2 accounts, ~1.6 requests/s 2x

~42 min 5 accounts, ~4.0 requests/s 5x

~21 min 10 accounts, ~8.0 requests/s 10x

The bottleneck is per-account rate limiting, and per-account limits do not know about your other accounts. Throughput is a head count.

[ 08 / 10 ]Breaking the 10,000-member ceiling

GetParticipantsRequest hard-caps at 10,000 members per search query, which is where /audit's limit comes from and why six-figure groups looked permanently out of reach. But the cap is per query, and the search filter is a parameter.

So the scanner iterates the filter: the empty string, then "a" through "z", then digits, then Unicode ranges for the alphabets crypto Telegram actually writes names in. Every query returns its own capped slice of matching members; dedupe the union by user_id and the ceiling moves. A 100,000-member group yields 40,000+ unique members this way, and realistically 50,000 to 80,000 once parallel sessions split the alphabet between them.

[ 09 / 10 ]Results: 3,800+ and counting

Where the system stands, after working through a run of established communities that started with @spx6900:

3,800+ flagged accounts on the global blocklist
2,800+ inactive accounts tracked in their separate table
4,400+ bios scanned and stored

Two patterns fell out of the data. Established, settled groups flag the least; newer, fast-growing groups flag the hardest, because spammers spend their effort where the growth is. And the pattern that named this post: spammers reuse accounts across communities. The same account working five groups gets caught in one and is pre-banned in the other four before it says a word there.

[ Catch them once, block them everywhere. ]

[ 10 / 10 ]The dashboard and the endgame

Hold 4,400+ bios and the full metadata of every catch, and you are no longer keeping a ban list - you are sitting on threat intelligence. So it gets the dashboard it deserves: a live, interactive visualization of the global blocklist at greatteacheronizuka.com/telegram-blacklist-visualization. Threat categories. Blocking velocity. Bio patterns. And a network map of how spam clusters connect across communities - the same accounts, bio templates and link farms surfacing in group after group, drawn as a graph instead of a hunch.

That map is the endgame. When a cluster surfaces in one community, every other community in the network already holds the antibodies; the accounts are flagged before they ever knock. Not reactive banning. Preemptive intelligence.

The bot is free and guarding 10+ communities as of this post. If yours should be next: t.me/GTO_AntiSpam_Bot.