When you publish a website that visitors can interact with, you have to worry about malicious HTML in it. It can happen even if no one breaches your server. There are several sneaky ways others can add content which is dangerous to website security. If you allow user input, you need to stay on guard against HTML injection. Filtering or sanitizing user input will make it safe, but doing a thorough job of it is tricky.
How HTML injection works
Back in 2009, some people saw "grills to cook babies" on the Sears website. This resulted in a lot of amusing discussions, and fortunately, it didn't cause anything worse than embarrassment. What happened was that Sears allowed URL parameters to fill in its category headers. Someone noticed that it would accept anything without validation.
The trick didn't change anything on the website. It only changed the way the page was displayed for the user who got the altered URL. Sears' publicists got confused and said that "someone visiting our site had defaced a limited number of product pages." Actually, the pages weren't altered, and anyone viewing them normally wouldn't see anything unusual.
We can call this approach client-side HTML injection. The content on the server isn't compromised, but what's displayed in the browser is. Someone after profit rather than fun could inject links and deceptive content into a site by distributing a specially crafted URL or POST request. Altered cookies are another possible attack path.
The more common approach is server-side HTML injection. This submits material, such as a blog comment, to a site and includes HTML. Some HTML may be legitimate, but it can have malicious aims. HTML injection is the most common way to perform cross-site scripting (XSS). XSS is its own category, so we'll focus here on other forms of injected HTML.
Sources of HTML injection include forum posts, comments, third-party ads, and previews of pages from other sites. Any site that accepts and displays input from untrusted parties could be at risk.
When people visit a familiar website, they let their guard down. They don't expect anything dangerous, and they'll accept even unusual information and requests from it.
If the attacker can inject meta tags, that's very dangerous. It can set cookies, redirect to another page, and change the security policy. The HTML specification allows meta tags only in the HTML head, which is hard to attack, but not all browsers are strict about it. An intruder might be able to get meta tags into the HTML body.
Sites that accept comments and other user input try to display it in a designated box, but there are ways it can escape. If the added content can use unrestricted CSS, it could appear somewhere else in the page. Any part of the page's content could get a deceptive overlay. The invading content could add controls that look legitimate but perform dangerous actions.
The user might see something like "Exciting news! Click here to learn more!" in the middle of the usual content. Clicking on it starts the download of a dangerous file, opens a dangerous page, or asks for a username and password.
"Hacktivist" injection could add a glaring message in large flashing text, protesting the site. Less high-minded attackers might add offensive messages, making them look as if they come from the site owner.
The simplest way to prevent HTML injection is to keep any third-party content from getting into the site. This isn't quite as easy as it sounds. The first step is to exclude all user input. It's also necessary to guard against altered cookies and URLs. Carelessly copying their parameters and content onto the page's content allows client-side or server-side injection.
If accepting and posting user input is part of the site's purpose, defense becomes more complicated. The site can disallow all HTML tags and accept only plain text, but the server code has to be on the lookout for tricks. By using quoting mechanisms, devious users can insert tags into the input without ever entering a "
Users like to be able to mark up their text, adding bold and italic typefaces and perhaps inline images and embedded videos. A site that supports this needs to do careful sanitization or filtering of the input. Some tags, such as em and strong, are safe. Others, such as script and applet, definitely aren't.
It's not just tags which are risky. HTML attributes and CSS properties also need to be restricted. An attribute with an opening quote and no closing quote could eat content which should appear. The position CSS property can move user-provided content into places where it looks like part of the website. If links are allowed, they could lead to dangerous sites.
Defeating all the tricks a determined attacker can try is difficult. The best approach is to use an HTML sanitization library that has a good reputation. The Java-based OWASP HTML Sanitizer Project is an example which is available at no cost. Reputable CMS and blogging software includes HTML sanitization for user responses. Any page which accepts user input should undergo penetration testing by experts at injection techniques.
Website developers should avoid accepting raw input directly and writing their own filters. It's just too complicated, and the cost of a bug is high.
Monitoring and protection
Website security measures should include preventive ones. When malicious user input is stopped before it touches the Web server, the site is that much safer. The Quttera Web Application Firewall recognizes many kinds of suspicious inputs and blocks them. Domains that are known to be untrustworthy don't get through. A part of Quttera's ThreatSign website anti-malware system, it identifies vulnerabilities so that system managers can patch their software.
HTML injection exploits can be hard to spot since the underlying content and server are intact. It's just the content as presented which does the damage. A strong set of security measures is important to every website, but ones that allow user-contributed content have to redouble their protection. Unlocked doors need careful watching.