Web applications security is a tricky matter, not only because of the inherent complexity and variety of the web-platform itself, but also because of the number of possible interactions between services, and the "layered" structure of the platform.
Fully understanding all security implications, from TCP/IP to DNS to system limits to XSS and XHR is probably out of reach for the vast majority of web-developers (of course, I'm not including you into these, dear savvy reader :-) ).
This post would like to present a list of common security pitfalls and topics I've seen (and still see) regularly over the years in "common" web-development. While the most obvious (or most publicized) problems may now be well understood (say, injection), even experienced web-developers may still fall for the more obscure ones, yet as potentially damaging - or even for corner cases of the known ones.
Cookies
Possibly one of the first widely abused thing in web-applications, cookies are still (in most web-applications) a valuable piece of information that may very well carry enough to actually hijack a user session. While browsers are probably more careful nowadays with cookies than they used to, it's still possible to take advantage from careless uses.
Some basic rules:
- whenever possible, restrict a cookie to the very domain on which you are using it. If you are on app.mydomain.org, then restrict your cookie for that domain, not for .mydomain.org. If the latter, you would expose yourself if a vulnerability was to be found in another application on a subdomain of mydomain.org (that's cross-subdomain cooking).
- mark the cookie as secure whenever appropriate
- you may want to take advantage of the httponly Internet Explorer extension, as a way to protect you from some XSS attack vectors, by not exposing your cookie to javascript
- avoid relying solely on cookies to authorize users to perform critical operations: if your session mechanism relies on a session token stored in a cookie, then you would better add something else on such operations (asking him to confirm the password, captcha, whatever)
- in the same train of thoughts, you probably should secure the session token transport-cookie value by implementing a not-dull mechanism (create a token hash using a server secret, salt it with some of the UA properties and a nounce count, and renew the token value regularly). That will make life way harder to exploit a cookie, as knowing the token value will no longer be enough to hijack a session.
Sessions
Assuming you avoid basic horrors (session based on a GET parameter token), you may very well still fall on session fixation.
Consider the following php code:
<?php
session_start();
echo "session id is: " . session_id();
?>This is vulnerable to session fixation ("class 1", for "random token" session fixation). Try navigate to a page hosting that. Edit the value of the cookie to "whatever". Then navigate again to that page.
Why is session fixation bad? Because it will make the session token predictable. So, assuming evilman can set your session cookie value to something he knows, this will be a lot easier for him to hijack your session whenever you will log in on that site. And this is likely to happen (cross-site cooking, cross-subdomain cookie, possibly DNS-rebinding, or a vulnerability in the user-agent).
So, maybe you think it's enough to check if a session already exists for the given token, in order to prevent session fixation? Well, it isn't. EvilMan may very well initiate a valid session against your site, obtain a legit token (simple request on your site), spoof the user cookie with that (using any of the suggested methods) and enjoy fixating the user, with a validated token for his existing session ("class 2" session fixation, for "existing legit token" session fixation).
Locking a session to the user IP may come to mind in order to prevent that, although this is generally a bad idea (think about users with dynamic ips, proxys, long-lived sessions). Rather, salting the session with User Agents characteristics may be a sound basis. Or, and this is probably the best way to do it, change the session token when the user do logs-in and delete the unlogged session.
Anyway, cookie based sessions are fundamentally insecure, and so are HTTP BASIC/form-based authentications. People looking for something more serious may instead use a DIGEST based authentication, and use it as a transport mechanism for the session token (possibly using the challenge or the opaque), thus getting rid entirely of cookies as a session token transport mean. While DIGEST sure can be abused as well (man in the middle), and sure doesn't fix XSS magically, its use definitely raises the bar when it comes to session hijacking (assuming you implement it well).
Injections
I still consider not validating user-input as the #1 problem in web-applications.
Don't smile. Maybe you are protecting yourself from SQL injection. Maybe you are also protecting yourself from include injection, or direct code evaluation. Maybe you are not falling on "register globals" either (one way or the other). And maybe you think you are safe from XSS (be it type 1 or 2).
Then, do you still think it's enough to addslashes to protect yourself? Are you positive you are immune to invalid multi-bytes characters used to eat escaping characters? To variable-width characters encodings? Are you sure you are specifying a character set as an optional parameter to htmlentities() and mysql_real_escape_string()? That you are always specifying a charset in your HTTP header in order to avoid the UA pick one based on heuristics you might not control or understand?
A note about exploiting URLs publication
One very common mistake I see on sites that allow posting a url is the use of a blacklist for schemes. Typically, such sites will prevent people from posting urls that begin with file:// (and possibly a couple other schemes), and will let through the rest.
So? What could possibly go wrong with a properly html escaped url that is not begining with file? Well, it could be a uri, in a scheme that has been recently introduced, or that you weren't aware about (think "resource" protocol...). Typically, as people are not yet familiar enough with it IMHO, it could be a data uri. That may be, for example, a base64 encoded uri containing a javascript payload. Depending on the UA the visitor seeing the page is using, this may have some quite bad effects (there have been quite some discussion on the potential uses of this already).
Using black-lists for scheme is really a bad idea. One should really use a white-list of allowed schemes instead.
Cross-site requests forgery
The idea is pretty simple. While a user is logged-on and visiting targetsite.com, requests made to that site by third parties executed on the victim computer will be authorized just like he is when visiting the site. Assuming there are forms on targetsite.com, evil.com may very well submit results that will end-up processed.
People tend to think you can only CSRF GET requests, and that POST protects you. Well, it doesn't.
So, if there is a form on your website that let the user change its password, and that do a POST for this with only the "password" POST variable along the session cookie, then you are doomed: a random website that the user would visit simultaneously can do that very same request to your site at will.
I've been surprised how misunderstood this problem is, and I think web-devs should be urged to pay attention to this: one must add an additional token along the form in order to validate/accept the submission. No exception! (on anything that is a non-trivial)
Now, don't rest yet... as you may know, it's possible to sniff on the user navigation history using css and javascript, hence GET tokens can very well be brute-forced entirely on the client-side... So, make that token random/unique, and favor POST.
Insecure policies
Another problem that may bite hard involves too lax cross-domain policies.
Speaking about flash, for example, it was at some point common to see crossdomain.xml authorization files allowing "*" to access the domain, so that any flash applications could retrieve content from the site (say, images).
What's wrong with it? Well, that also means: any flash application on any website can send and read any requests to your site, impersonating the user. Submit forms (if they are vulnerable to CSRF)... Retrieve any content that the user can view...
The same will go with cross-domain XHR (which is coming): open-wide policies will guarantee havoc...
Note that this one actually did hit a lot of high profile websites in the past, including adobe themselves IIRC.
Hosting user-generated content
Hosting user-generated files is tricky. Validating file extensions on the client-side is a totally lame protection, very unlikely to protect you in any way. Well, let's assume you did the job well, and that you won't fall on a symlink disclosing your server data (ping shad :-)), and that you at least won't execute server code in a user submitted file.
Still, it's quite possible that some day somebody manages to workaround your protections in a way or another, and is able to send and retrieve, say, html, or a flash application, from your server. Well, if you do make these user content publicly available on the same host as your main application, that's a plain old persistent injection.
Put otherwise: never, ever, host user content under the same host as your site, no matter how secure you think your publication process is.
And even in that case, things may still be tricky. Insecure policies from above may bite you back, if your main host does crossdomain.xml allow the host that contain the user-generated content...
The DNS-rebinding threat
One last interesting point to end this list is DNS-rebinding attack vectors, that got a lot of attention lately. Technically, this approach trumps SOP, forcing the user agent to rebind a domain name under "evil" control to the ip of "target".
The point being: if you manage to have the user visit your page (before rebinding), you can load flash or javascript that will sit there waiting for the rebinding to occur, then allowing your client side code to access whatever is there on the target site.
There are tons of exploitation vectors for this, from fetching content from web-servers on (theoretically) unreachable internal networks, to harvesting content from search engines using the users ip (so not to get blacklisted).
Mixing this with social-engineering, you get an explosive cocktail. Way simpler than DNS cache poisoning or man in the middle attacks. Now, would that be exploitable concretely on a random web-server? Yes, absolutely, if the web-server in question doesn't validate the Host header (hence delivers content regardless of the domain name used to access it) - or obviously unless SSL is on.
Conclusion
This list doesn't pretend to be complete, neither enough to secure a web-application from scratch (that should start with securing your kernel, distro, web-server, etc), nor even self-sufficient to understand the described issues without additional reading (XSS in itself would deserve a whole book IMHO).
Its purpose (and I would consider it a success if it does) is only to get web-developers pay more attention to these points.
References and interesting reading
There exist a lot of interesting blogs dealing with web-apps security, with way more in-depth informations about all this. On top of the head, I would mention:
- http://ha.ckers.org/
- http://blogs.msdn.com/dross/
- http://sirdarckcat.blogspot.com/
- http://shiflett.org/blog/
- http://securethoughts.com/