Validation: Semantic or Syntactic

Gunnar posts:

James Clark proposes another way to look at this:

Validity should be treated not as a property of a document but as a relationship between a document and a schema.

From a security perspective the validation relationship is between document and the allowed characters (white list – strongest) or disallowed characters ( black list – weaker).

So which should it be, semantic or syntactic?

Gunnar continues:

The lists will evolve independently as either the app evolves (in the case of white list) by adding different types/values or as attacks evolve (black list) "hey we blocked "\r" and "\n" but never thought about "\n\r"."

Ah, mere lists of characters aren’t enough. Already you have to go to ordered strings of characters, which is starting to build towards semantics.

It’s not enough to prohibit seven dirty words or whatever the validation equivalent is to keep out "SQL injection, LDAP injection, XPath injection, and so on." You have to deal with semantics, as well.

Gunnar also says:

On the web all input is guilty until proven innocent; validation is the proof.

Well, good luck, given that some cracker may think of something you didn’t, and for that matter whole new languages may crop up for crackers to exploit that you didn’t know about when you wrote your validation algorithm.

When validation fails, risk management needs to be ready.

-jsq