Building ClickArmor — Every Evasion We Found and How We Beat It
ClickArmor was built by testing against real attack infrastructure — not synthetic samples. This post documents the evasion techniques we encountered across thousands of live ClickFix and credential phishing domains, and the detection engineering that followed.
Em Dash Flag Evasion
A ClickFix payload was discovered using an em dash (U+2014) instead of a standard hyphen before the -e encoded command flag: pOweRSHeLL —e [base64]. PowerShell accepts the em dash as a valid parameter prefix, but regex patterns matching -enc or -e only look for the ASCII hyphen U+002D.
The same payload also embedded double quotes throughout the Base64 blob — a"QBy"AG0A — which breaks contiguous pattern matching for [A-Za-z0-9+/=]{20,} runs. PowerShell silently strips these quotes during argument parsing.
Both techniques were countered by adding a normalization step that collapses em dash, en dash, figure dash, horizontal bar, and fullwidth hyphen to the ASCII hyphen, and a second pass that strips inline quotes embedded between alphanumeric characters.
Cyrillic Homoglyph Evasion
A ClickFix page rendered instructions that appeared to say "Press Windows Button + R" to a human reader, but the actual text used Cyrillic lookalike characters — Р (U+0420) instead of P, е (U+0435) instead of e, ѕ (U+0455) instead of s. The lure phrase regexes, which match ASCII /press\s+win(dows)?/i, passed right over the Cyrillic text.
The fix was a normalization layer applied before all page text analysis. It performs NFKD decomposition, strips combining diacritical marks and zero-width characters, normalizes exotic whitespace variants (U+2005 four-per-em space, etc.) to regular spaces, and maps 40+ Cyrillic and Greek homoglyphs to their ASCII equivalents. This single addition defeated the entire class of cross-script evasion.
document.write() DOM Replacement
A compromised WordPress site used a multi-stage C2 loader: an obfuscated inline script fetched a payload from a remote server via XHR, then called document.open("text/html") followed by document.write() to replace the entire page with a ClickFix lure. This is a nuclear option — it destroys the MutationObserver, all injected UI, all event listeners on document, and all page-world clipboard hooks.
The content script's isolated world JavaScript context survives the replacement, but all its DOM references are orphaned. The fix was a heartbeat monitor that periodically checks whether a hidden marker element still exists. If the marker disappears, the system detects the DOM was replaced, resets all UI state flags, reattaches the MutationObserver to the new document, re-registers event listeners, attempts to re-inject clipboard hooks, and immediately re-scans. The existing lure detection layers then score the newly injected ClickFix content and re-show the warning.
CSP-Blocked Clipboard Hooks
ClickArmor's clipboard interception works by injecting a <script> tag into the page world at document_start to hook navigator.clipboard.writeText and document.execCommand('copy'). Sites with strict Content Security Policy headers block this inline script injection silently.
When the hooks can't install and a ClickFix page writes a malicious command to the clipboard via a click handler, the standard interception path fails entirely. The fix was an alternative clipboard monitoring approach: on pages scoring above the threat threshold, a capture-phase click listener watches for interactions with elements that look like copy or verification buttons, then reads clipboard contents after a short delay using navigator.clipboard.readText(). This bypasses the page-world hook requirement entirely — the extension reads what landed in the clipboard after the fact rather than intercepting the write call.
Obfuscated Remote Script Loaders
A class of ClickFix pages was identified where the page source contains no malicious content at all. Instead, a heavily obfuscated JavaScript loader fetches the actual payload from a remote C2 server. The obfuscation techniques observed include _0x-style variable renaming (obfuscator.io output), Base64-encoded C2 URLs inside atob() calls, string array rotation with push/shift loops, parseInt shuffling chains, CSS body-hide tricks (opacity:0 with delayed fadeIn), and sessionStorage gating to only fire once per session.
A dedicated detection layer was built that scans all inline <script> blocks for these obfuscation signatures. Each signal is scored independently, and the combination of multiple signals — atob with a long Base64 string, URL decoded from Base64, _0x obfuscation, dynamic script injection, and body concealment — produces a high-confidence detection even though the actual ClickFix payload hasn't been delivered yet.
Fake Browser Update Progressive Reveal
A variant disguised as a browser update page used a progressive reveal technique. On initial load, only a benign-looking "Install Update" screen was visible. The malicious content — Win+R instructions, a PowerShell command, and a copy button — lived inside a hidden <div> with display: none. JavaScript progressively revealed steps 1 through 4 as the user clicked through the flow.
The page analyzer originally scanned document.body.innerText, which excludes display:none content. The fix was to also scan document.body.textContent, which includes hidden elements, and take the higher score between visible and hidden text analysis. A dedicated fake browser update detection layer was also added, catching version comparison UIs, fake download progress, and the critical "finalize update via command" pattern.
Server-Side Payload Delivery
One domain used a technique where the clipboard payload was never present in the page source. Instead, an AJAX call to api.php returned the payload only after the user clicked a fake CAPTCHA checkbox. The server response contained the payload with junk characters interspersed and a delimiter to strip them — a custom deobfuscation scheme designed to avoid Base64 patterns.
The clipboard hook catches the final execCommand('copy') call with the assembled string, but the page-level analyzer sees nothing suspicious in the static HTML. The defense here is multi-layered: the page's lure text (Win+R instructions, verification language) triggers the page-level warning and lowers the clipboard scoring threshold, while the clipboard hook intercepts the dynamically assembled payload when it's finally written.
nslookup DNS Payload Staging
Following Microsoft's disclosure of a new ClickFix technique, detection was added for attacks that abuse DNS lookups to stage payloads. The technique pipes nslookup output through findstr to extract encoded commands from DNS TXT records, then executes them. Both clean and caret-obfuscated variants (n^s^l^o^o^k^u^p, p^o^w^e^r^s^h^e^l^l) were covered.
Platform-Hosted Credential Phishing
A significant portion of credential phishing is hosted on legitimate platforms — Google Forms, Google Docs, Google Drawings, Google Sites, SharePoint, WordPress.com, Weebly, Wix, and dozens of other free hosting services. These domains are trusted by default, often whitelisted by security tools, and bypass domain reputation checks entirely.
The detection approach separates platforms into two tiers. Tier 1 platforms are pure form builders (Google Forms, Typeform, Jotform) where a password field is always suspicious. Tier 2 platforms are hosting services (WordPress.com, Netlify, Heroku) where legitimate apps have real login pages — these only fire when there are signals beyond a simple email-and-password form: brand impersonation, high-severity field combinations, or threatening language.
For Google's own platforms, the detection correctly allows cross-brand impersonation checks — a fake Microsoft login hosted on docs.google.com gets flagged, while Google's own legitimate services do not. Google Forms pages where the form questions request passwords are caught via deep DOM traversal that walks the nested container structure to find question heading text associated with input fields.
Brand Impersonation at Scale
Brand detection was validated by running the full PhishTank verified feed — 56,000+ URLs — through a lightweight HTTP extraction pipeline. The pipeline fetches each page's HTML, extracts title, meta tags, headings, form labels, and body text, and matches against a database of 137 brands with 318 aliases. On dedicated phishing domains with active credential forms, zero evasions were observed. The pipeline also identified 33 new brands being impersonated that were added to the detection database.
When a phishing page impersonates a known brand, the warning overlay identifies the brand by name and tells the user what the real domain should be — turning a generic "suspicious form" alert into an actionable "This page is pretending to be Microsoft. The real site is login.microsoftonline.com, not this domain."
Validation Pipeline
The testing infrastructure consists of a three-stage automated pipeline. Stage 1 performs lightweight HTTP fetches to filter dead domains and extract form indicators. Stage 2 launches a real browser with ClickArmor loaded, visits each URL, and polls for detection signals — the data-cg-page-threat attribute confirms the scan completed, and data-cg-banner-shown and data-cg-overlay-shown confirm the warning UI was rendered. Stage 3 screenshots every miss for manual review and categorization.
The pipeline is designed to produce honest numbers. Misses are classified into actionable categories — true misses (detection gap, needs a fix), dead pages (taken down since the feed was published), platform-hosted (legitimate domain, different detection problem), and generic webmail (no brand to match). This prevents inflated detection rates from counting dead pages as successes.
Multi-Stage C2 Loader Detection
A class of attacks was discovered that chains multiple loader stages before the ClickFix lure ever appears. The initial page contains a data:text/javascript;base64 script src that decodes to a first-stage loader. That loader double-decodes a Base64 array of fallback C2 server URLs, uses cascading onerror callbacks so if one server is down the next fires automatically, and ultimately fetches a final payload from a remote server. The final payload uses document.write() to replace the entire page with the ClickFix lure. A sessionStorage counter gates the attack to fire only a limited number of times per session, so researchers see nothing on revisit.
Detection was added as a dedicated layer that scans inline script blocks for the specific combination of signals: data:text/javascript;base64 script sources, nested atob() calls with URL arrays, onerror cascading fallback patterns, XHR response piped into script.textContent injection, junk comment flooding (where over 75% of a script file is fake comments to obscure the real code), and sessionStorage counter gating. The heartbeat recovery mechanism already in place then handles the document.write() DOM replacement that follows.
Embedded Payload Detection
Some ClickFix pages embed the malicious command directly in DOM elements — inside <code>, <pre>, <textarea>, <input>, or data-clipboard-text attributes — rather than dynamically writing it to the clipboard via JavaScript. The page-level scanner only checked visible text for lure phrases, not for actual command payloads sitting in the DOM waiting to be copied.
A new detection layer was added that scans these DOM elements directly for command patterns: PowerShell download cradles, mshta remote execution, cmd wrappers, curl drops, and LOLBin invocations. When a match is found, the detector also checks for adjacent copy buttons. A fake browser update page with a PowerShell iwr|iex command inside a <code> element and a nearby "Copy" button scores high enough to trigger the warning before the user ever clicks.
Service Worker Message Delivery
A detection gap was identified where the clipboard scanner correctly scored a payload as malicious, but the block verdict never reached the content script to display the overlay. The root cause was a race condition in the service worker's async message handler. Each await in the handler is a yield point where the service worker can go idle, and if the message port closes before sendResponse executes, the content script receives undefined and silently bails.
The fix replaced the fire-and-forget async pattern with an explicit promise chain that holds a reference the runtime can track. The handler now wraps all logic in a named async function, chains .then(sendResponse) to ensure the response fires even if the service worker yields between storage operations, and includes a .catch fallback that sends an error response rather than letting the port close silently.
Google Forms Deep DOM Traversal
Google Forms renders question labels and input fields in a deeply nested container structure. The question heading text and the associated input element are siblings at a very high ancestor level — typically 6 to 8 DOM levels apart. The field context scanner originally walked only 3 parent levels looking for label text, which never reached the question heading on Google Forms pages.
The fix extended the parent walk to 10 levels and added structural awareness for Google Forms' specific DOM patterns: role="listitem" containers, data-params attributes where Google stores question metadata, and CSS classes used by Google's form rendering engine. The deep walk finds the question heading associated with each input field, enabling the form detector to catch Google Forms pages that request passwords, MFA codes, or other sensitive data through their question labels.
Contenteditable Scanning
Phishing pages hosted on Google Docs use contenteditable divs rather than standard <input> elements. The form detector scanned only <input>, <textarea>, and <select> elements, making Google Docs credential harvesting pages completely invisible to field-level scanning.
Detection was extended to scan contenteditable regions for sensitive data patterns in their visible text content. On platform hosts like docs.google.com, a standalone brand check was also added that triggers the blocking overlay based on brand impersonation signals in the page text alone — without requiring traditional form input elements to be present.
Tier-Based Platform Gating
Adding dozens of free hosting platforms to the detection database introduced a false positive problem: legitimate login pages on WordPress.com, Netlify, Heroku, and similar platforms were triggering warnings. A simple email-and-password form on a WordPress-hosted app is a normal login page, not a phishing form.
The fix was a two-tier classification. Tier 1 platforms are pure form builders — Google Forms, Typeform, Jotform — where a password field is always suspicious because these platforms have no legitimate reason to collect credentials. Tier 2 platforms are hosting services where real applications have real login pages. On Tier 2 platforms, detection only fires when there are additional signals beyond a basic login form: brand impersonation, high-severity field combinations like password paired with SSN, unusual field requests like former credentials or school names, or threatening page text.
Standalone Credential Harvester Detection
Some phishing pages collect credentials without impersonating any specific brand and without matching any known sensitive field patterns. A page on a free hosting platform with three or more input fields and threatening language — "failure to verify will result in account closure" — is almost certainly a credential harvester, but it doesn't match any brand and its field labels may be generic.
A standalone detection path was added for pages on form platforms that combine multiple input fields with coercive text patterns: account termination threats, urgency deadlines, unauthorized access claims, and compliance language. This catches the long tail of credential phishing that falls outside brand matching and specific field label detection, while the platform tier gating prevents it from firing on legitimate forms.
Early Return Logic Bug
A single misplaced line silenced every detection layer on an entire class of fake browser update pages. In the lure phrase scanner, a let score = 0 declaration was positioned after the loop that incremented it. Accessing a let variable before its declaration throws a ReferenceError from the temporal dead zone — and because the lure scanner was the first layer called in the analysis chain, the uncaught exception crashed the entire page threat calculation. Every subsequent detection layer — fake CAPTCHA, fake errors, obfuscated loaders, embedded payloads — was never reached.
The fix was moving one line of code. The broader lesson was adding the analysis chain to the automated regression pipeline so that a crash in any single layer would surface immediately rather than silently disabling all downstream detection.