Platform-Hosted Phishing — Detecting Credential Theft on Domains Your Security Stack Trusts
Google Forms collecting passwords. WordPress.com pages impersonating Microsoft 365. Google Drawings linking to credential harvesters. Weebly hosting fake bank logins. These are phishing pages on domains that every blocklist, URL filter, and reputation engine considers safe.
The Problem
Traditional phishing detection relies on domain reputation. A URL on docs.google.com or wordpress.com passes every reputation check because the domain is legitimate — it's the content that's malicious. Attackers exploit this by hosting fake login forms on free platforms where anyone can create content.
From the PhishTank dataset: Weebly alone hosted 1,243 phishing pages with 1,030 containing active credential forms. Google Docs, Sites, Drawings, and Forms collectively accounted for hundreds more. WordPress.com, Wix, Blogspot, and dozens of other free hosting services filled out the long tail.
How We Added Detection
Tier-Based Platform Gating
Adding 45 platforms to detection introduced a false positive problem — legitimate login pages on WordPress apps and Netlify-hosted services were triggering warnings. A simple email-and-password form on a WordPress-hosted app is normal.
The fix was a two-tier classification. Tier 1 platforms are pure form builders — Google Forms, Typeform, Jotform, SurveyMonkey, Carrd, Google Sites — where a password field is always suspicious because these platforms have no legitimate reason to collect credentials. Tier 2 platforms are hosting services — WordPress.com, Netlify, Heroku, Vercel, Wix — where real apps live. On Tier 2, detection only fires when there are additional signals beyond basic login: brand impersonation, high-severity field combinations, or threatening page text.
Google Forms Deep DOM Traversal
Google Forms renders question labels and input fields in deeply nested containers — the heading text and the input element are typically 6–8 DOM levels apart. The field scanner originally walked 3 parent levels looking for associated label text, which never reached Google Forms' question headings. A 10-level deep walk was added with structural awareness for Google Forms' specific DOM patterns: role="listitem" containers, data-params question metadata attributes, and Google's form rendering CSS classes.
Contenteditable Scanning
Phishing hosted on Google Docs uses contenteditable divs instead of <input> elements. The form scanner was extended to scan contenteditable regions for sensitive data patterns in their visible text. On platform hosts like docs.google.com, a standalone check triggers the blocking overlay based on brand impersonation signals in page text alone — without requiring traditional form elements.
Brand-Aware Overlays
When a phishing page impersonates a known brand, the warning overlay identifies it by name. Instead of a generic "suspicious form" alert, the user sees: "This page is pretending to be Microsoft. The real site is login.microsoftonline.com, not this domain." Brand detection covers 137 brands with 318 aliases, validated against the full PhishTank verified feed.
Standalone Credential Harvester Detection
Some phishing pages collect credentials without impersonating any brand and without matching known field patterns. A page on a form platform with 3+ input fields and threatening language — "failure to verify will result in account closure" — is caught by a standalone detection path that combines field count, platform context, and coercive text analysis.
Tested against 56,000+ PhishTank verified phishing URLs using a 3-stage automated pipeline. On dedicated phishing domains with active credential forms, zero evasions were observed. Platform-hosted phishing on Google Docs, Forms, Sites, Drawings, SharePoint, WordPress.com, and Weebly is detected via cross-brand impersonation checks and deep DOM traversal.