Stijn Moreels

28 April 2022

all Technical posts

Extending F# FPrimitive With European Character Input Validation

F# FPrimitive is a great library for domain modeling validation and specification definitions. In a practical context though, some missing links may arise. Let's take a look at how we can extend the library to include European-supported character input validation.

The problem

During input validation on input fields, one has to come up with an ‘accept’ list, and ask: “What kind of characters should we allow?”

Generally, it’s much safer to look towards what you can accept, than what you should reject. We can already validate a lot before we even begin with the actual characters: origin, content length, syntax, schema, lexical content, encoding. In this blog post, we’ll look at the location of these steps, and the actual content of the input.

So, what’s the problem?

Imagine validating a city’s or place name, for example. If it were an American or British city, we would probably be able to validate the name based on the alphabet, with A-Z and its lowercase variants. In regular terms, this would be ^[A-Za-z]+$. This is already very strict and would allow us to filter out any special characters such as punctuation or other code characters that would allow injection. However, if we want to validate a Swedish city, such as Malmö, our previously defined pattern wouldn’t work.

European regular expression pattern

I took it upon myself to research all of the special characters used in Europe. Some states have a lot of special characters, others do not. Nordic countries have å, ä, ö, ø, while Southern countries have ñ, ú, é, á. We are dealing with a large and diverse set of characters. For security experts, it’s a bit harder, as we have to take into account all of these different ways of writing.

All of these special characters combined, with capitals included, would result in this regular expression pattern:

^[A-Za-zÁáĂăÂâÅåÄäǞǟÃãĄąĀāÆæĆćĈĉĊċÇçĎďḐḑĐđÐðÉéÊêĚěËëĖėĘęĒēĞğĜĝĠġĢģĤĥĦħİıÍíÌìÎîÏïĨĩĮįĪīĲĳĴĵĶķĹĺĻļŁłĿŀŃńŇňÑñŅņŊŋÓóÒòÔôÖöȪȫŐőÕõȮȯØøǪǫŌōỌọOEoeĸŘřŔŕŖŗſŚśŜŝŠšŞşṢṣȘșẞßŤťŢţȚțŦŧÚúÙùŬŭÛûŮůÜüŰűŨũŲųŪūŴŵÝýŶŷŸÿȲȳŹźŽžŻżÞþªº]+$

One might wonder if this impacts performance. The way this pattern is written is very strict and it doesn’t include backtracking, only allowing this specific set of characters. In combination with the .NET RegexOptions.Compiled, we can ensure that the most optimal usage of this pattern validation is used.

It’s also worth mentioning that the validation can also include some sanitization. We can validate on multiple words and make sure that two single space characters are rejected, but the sanitization can also prepare the data for us. The sanitization could also do things such as limit the number of space or dash characters used, make everything lower-case, or only the first letter upper-case. This is too project-specific to discuss here. However, it should be taken into account when validating the input.

Validation extensibility model

All theory aside, we should determine how we can integrate this in our projects. This is a very generic and overall collapsing validation. It could be widely re-used across projects, but it’s probably not generic enough to be included in third-party libraries. A common code space or an umbrella project could both be good places for these kinds of validations.

As an example, here’s how we could extend the F# FPrimitive library to include character validation on European characters:

Let’s also include a purely C# example. I’ll choose FluentValidation here, as it’s a very popular library:

Conclusion

Input validation is a very important topic in software security. It’s prone to errors and should be looked into very closely. Determining which kind of inputs could occur is an important aspect of this. This post looked into the possibility of validating European words. However, it’s very wrong to use every kind of input or to use the most basic types that your code language provides. This doesn’t reflect the domain and is full of risks on many levels.

Balancing how much to validate and how much to sanitize is a project-specific decision and should be discussed carefully. Allow as much as possible within the bounds of your domain, but be able to ‘help’ the input with a secure sanitization of the input. That’s the sweet spot, in my opinion.

Thanks for reading,
Stijn

Subscribe to our RSS feed

Advanced and Realistic Domain Model Validation Building Blocks in F# FPrimitive with C# Interop

Model validation is a big and important topic. Lots of application security issues are related to input validation. See how FPrimitive transcends simple validations and can be used in more advanced scenarios.

From Untrusted Input to Strict Model with Layered JSON Parsing in F# FPrimitive & C#

How should we accept untrusted input and transform it somehow to a strict domain model? Find out in this post.

Domain-Driven Security with F# FPrimitive and C# Interop

What if you could have a more secure code-base just by following the requirements of your domain? This post explores the idea of validation, access-control and trust-boundaries in the context of domain modeling.

Brussels Airlines’ Digital Transformation Takes Off

IoT Takes Bühler Group from Field to Fork

Going the Distance with Cloud-Connected Industrial Sensors

Swiss Re leverages Cloud Technology and Data Services for its Digital Risk Intelligence Solutions

Soudal is Digitally Transforming Sales in the Chemical Industry

Creating New Revenue Streams in Logistics by Connecting Data

Brussels Airlines’ Digital Transformation Takes Off

IoT Takes Bühler Group from Field to Fork

Going the Distance with Cloud-Connected Industrial Sensors

Swiss Re leverages Cloud Technology and Data Services for its Digital Risk Intelligence Solutions

Soudal is Digitally Transforming Sales in the Chemical Industry

Creating New Revenue Streams in Logistics by Connecting Data

Extending F# FPrimitive With European Character Input Validation

The problem

European regular expression pattern

Validation extensibility model

Conclusion

Related articles

Hi there,
how can we help?

Let's talk

Let's talk

Thanks, we'll be in touch soon!

Call us

Send blog to my inbox

Thanks, we've sent the link to your inbox

Your download should start shortly!

What can we connect for you?

Brussels Airlines’ Digital Transformation Takes Off

IoT Takes Bühler Group from Field to Fork

Going the Distance with Cloud-Connected Industrial Sensors

Swiss Re leverages Cloud Technology and Data Services for its Digital Risk Intelligence Solutions

Soudal is Digitally Transforming Sales in the Chemical Industry

Creating New Revenue Streams in Logistics by Connecting Data

Extending F# FPrimitive With European Character Input Validation

The problem

European regular expression pattern

Validation extensibility model

Conclusion

Related articles

Hi there,how can we help?

Let's talk

Let's talk

Thanks, we'll be in touch soon!

Call us

Send blog to my inbox

Thanks, we've sent the link to your inbox

Your download should start shortly!

Stay in Touch - Subscribe to Our Newsletter

Great you’re on the list!

What can we connect for you?

Hi there,
how can we help?