“Scalability Rules: 50 Principles for Scaling Web Sites” review

Recently I decided to get into the habit of reading IT books regularly. To start with, I wanted read something about building scalable architectures. I did a quick research on Amazon and chose Scalability Rules: 50 Principles for Scaling Web Sites by Martin L. Abbott, Michael T. Fisher. Based on comments and reviews, it was supposed to be more on the technical side. I was slightly disappointed in this aspect. However, I think this is still a worthy read.

The book is divided into 13 chapters. Each of the chapters contains several rules. What stroke me is that these rules are very diverse. We’ve got some very, very general advice that could be applied to any kind of software development (e.g. Don’t overengineer, Learn aggressively, Be competent). We’ve got stuff for CTOs or IT directors in large corporations (e.g. Have at least 3 data centers, Don’t rely on QA to find mistakes). There are also some specific, technical rules – what I was after in the first place. I’m not convinced mixing these very different kinds of knowledge makes sense since they are probably targeted to different audiences (which is even acknowledge by the authors in the first chapter).

Some of the rules felt like formalized common sense, backed with some war stories from the authors’ experience (e.g. AFK Cube). However, some of the stuff was indeed new to me. It was also interesting to see the bigger picture and the business side of things (potential business impact of failures, emphasis on the costs of different solutions, etc.).

I think the book is a great choice if you are a CTO of a SaaS startup or a freshly promoted Architect without prior experience of building scalable apps (having the experience would probably teach you much then the book). If you are a Developer who wants to get some very specific, technical advice then the book will serve well as an overview of topics that you should learn more deeply for other sources (such as database replication, caching, load balancing, alternative storage systems). Nevertheless, I think the book is a worthy read that will broaden your perspective.

Slick vs Anorm – choosing a DB framework for your Scala application

Scala doesn’t offer many DB access libraries. Slick and Anorm seem to be the most popular – both being available in the Play framework. Despite both serving the same purpose, they present completely different approaches. In this post I’d like to present some arguments that might help when choosing between these two.

What is Slick?

Slick is a Functional Relational Mapper. You might be familiar with Object Relational Mappers such as Hibernate. Slick embraces Scala’s functional elements and offers an alternative. Slick authors claim that the gap between relational data and functional programming is much smaller than between object-oriented programming.

Slick allows you to write type safe, SQL-like queries in Scala which are translated into SQL. You define mappings which translate query results into your domain classes (and the other way for INSERT  and UPDATE ). Writing plain SQL is also allowed.

What is Anorm?

Anorm is a thin layer providing database access. It is in a way similar to Spring’s JDBC templates. In Anorm you write queries in plain SQL. You can define your own row parsers which translate query result into your domain classes. Anorm provides a set of handy macros for generating parsers. Additionally, it offers protection against SQL injection with prepared statements.

Anorm authors claim that SQL is the best DSL for accessing relational database and introducing another one is a mistake.

Blocking/non-blocking

As mentioned, Slick API is non-blocking. Slick queries return instances of DBIO  monad which can be later transformed into Future . There are many benefits of a non-blocking API such as improved resilience under load. However, you will not notice these benefits unless your web applications is handling thousands of concurrent connections.

Anorm, as a really thin layer, does not offer a non-blocking API.

Expressibility

Slick’s DSL is very expressive but it will always be less than plain SQL. Anorm’s authors seem to have a point that re-inventing SQL is not easy. Some non-trivial queries are difficult to express and at times you will miss SQL. Obviously, you can always use the plain SQL API in Slick but what’s the point of query type safety if not all of your queries are under control?

Anorm is as expressive as plain SQL. However, passing more exotic query parameters (such as arrays or UUID s) might require spending some time on reading the docs.

Query composability

One of huge strengths of Slick is query composability. Suppose you had two very similar queries:

In Slick, it’s very easy to abstract the common part into a query.

In Anorm, all you can do is textual composition which can get really messy.

Inserts and updates

In Slick you can define two-way mappings between your types and SQL. Therefore, INSERT s are as simply as:

In Anorm you need to write your INSERT s and UPDATE s by hand which is usually a tedious and error-prone task.

Code changes and refactoring

Another important feature of Slick is query type safety. It’s amazing when performing changes to your data model. Compiler will always make sure that you won’t miss any query.

In Anorm nothing will help you detect typos or missing fields in your SQL which will usually make you want to write unit tests for your data access layer.

Conclusion

Slick seems to be a great library packed with very useful features. Additionally, it will most likely save your ass if you need to perform many changes to your data model. However, my point is that it comes at a cost – writing Slick queries is not trivial and the learning curve is quite steep. And you risk that the query you have in mind is not expressible in Slick.

An interesting alternative is to use Slick’s plain SQL API – it gives you some of the benefits (e.g. non-blocking API) but without sacrificing expressability.

As always, it’s a matter of choosing the right tool for purpose. I hope this article will help you to weigh in all arguments.

SBT: how to build and deploy a simple SBT plugin?

Few weeks ago when I was working on my pet project, I wanted to make it an SBT plugin. Since I had to spend some time studying SBT docs, I decided to write a short tutorial explaining how to write and deploy a SBT plugin.

Make sure your project can be built with SBT

First of all, your project needs to be buildable with SBT. This can be achieved simply – any project that follows the specific structure can be built with SBT. additionally, we are going to need a build.sbt  file with the following contents at the top-level:

Note that we are using Scala version 2.10 despite that at the time of writing 2.11 is available. That’s because SBT 0.13 is build against Scala 2.10. You need to make sure that you are using matching versions, otherwise you might get compile errors.

Implement the SBT plugin

Our example plugin is going to add a new command to SBT. Firstly, let’s add the following imports:

Next, we need to extend the AutoPlugin  class. Inside that class we need to create a nested object called autoImport. All SBT keys defined inside this object will be automatically imported into the project using this plugin. In our example we are defining a key for an input task – which is a way to define an SBT command that can accept command line arguments.

Now we need to add an implementation for this task:

And that’s it.

Test the SBT plugin locally

SBT lets us test our plugins locally very easily. Run the following command:

Now we need an example project that will use our plugin. Let’s create an empty project with the following directory structure:

Inside plugins.sbt , let’s put the following code:

Note that this information needs to match organization , name  and version  defined in your plugin. Next, add the following lines to build.sbt:

Make sure that you use the fully qualified name of the plugin object. You can use Scala version older than 2.10 in the consumer project.

Now you can test your plugin. Run the following command:

Note the use of quotes – you are passing the whole command, along with its parameters to SBT.

Make it available to others

If you would like to make your plugin available to other users, you can use OSS Repository Hosting. They are hosting a public Maven repository for open source projects. Packages in this repository are automatically available to SBT users, without further configuration.

The whole procedure is well described here. One of the caveats for me was to change the organization  property to  com.github.miloszpp (I host my project on GitHub). You can’t just use any string here because you need to own the domain – otherwise, you can use the GitHub prefix.

Scala-ts: Scala to TypeScript code generator

I have started using TypeScript a few weeks ago at work. It turns out to be a great language which lets you avoid many problems caused by JavaScript’s dynamic typing, facilitates code readibility and code refactoring and does that at relatively small cost thanks to modern, concise syntax.

Currently we are using TypeScript for writing the frontend part of a web application which communicates with backend in Scala. The backend part exposes a REST API. One of the drawbacks of such desing is the need for writing Data Transfer Objects definitions for both backend and frontend and making sure that they match each other (in terms of JSON serialization).

In other words, you need to define the types of objects being transferred between backend and frontend in both Scala and TypeScript.

Since this is a rather tedious job, I came up with an idea to write a simple code generation tool that can produce TypeScript class definitions based on Scala case classes.

I’ve put the project on Github. It’s also available via SBT and Maven.

Here is the link to the project: https://github.com/miloszpp/scala-ts

 

Issues with asynchronous IO in web applications

Building servers with non-blocking IO has been quite popular these days. Tests have shown that it does actually improve scalability of web applications. However, my experience show that it comes at a cost. In this post I am going to discuss some negative aspects of writing asynchronous code based on Scala’s Futures.

Stacktraces

Debugging exceptions in asynchronous programs is a pain. When issuing an asynchronous IO operation you provide a callback that should be executed when the operation returns. In most implementations, this callback might be executed on any thread (not necessarly the same thread that invoked the operation). Since call stack is local to the thread, the stacktrace that you get when handling an exception is not very informative. It will not trace back to the servlet so you may have hard time figuring out where what actually happened.

Thread-local variables

Some libraries use a mechanism called ThreadLocal  variables (available in Java and C#, in Scala known as DynamicVariable ). By definition, these libraries do not work well with asynchronous code, for the same reason that we get poor stacktraces.

I have already discussed one of such situations on my blog. Another one is Mapped Diagnostic Context from the Logback framework. MDC is a nice mechanism that allows you to attach additional information to your logs. Since the information is contextual, it will be available even to logs written from within external libraries. However, as one might expect, MDC is implemented with thread-local variables. Therefore, it doesn’t work well with Scala’s futures.

There is a way to get MDC with Futures working by writing a custom ExecutionContext  (Scala’s threadpool) that is aware of contextual data and propagates across threads.

Missed exceptions

Unless you are very careful, it is quite easy to not wait for a Future to complete but instead to fork execution into two branches. When an exception is thrown in a Future that nobody is waiting for, it will most likely just go unnoticed.

Above code will compile. However, saveToDb  will most likely be called before postData  returns since execution has been forked. Any exception thrown inside postData  will most likely be missed. The correct way to write the above code would be:

Caching

Caching gets more complicated in an asynchronous web application, unless the library you use for caching is designed to work with async code. One of the most common patterns in caching libraries is to let you provide a function that should be executed when a value in cache is missing. See the below example of Guava Cache:

If  doThingsTheHardWay returned a Future (was asynchronous) then you would have to block the thread and wait for the result. Mixing blocking and non-blocking code is generally discouraged and may lead to undesirable situations such as deadlocks.

Code readbility

Asynchronous code adds complexity. In Scala, you need to use all sorts of Future combinators such as flatMap , map  or Future.sequence  in order to get your code to compile. The issue is partially addressed by async/await  language extensions/macros (available for example in Scala and C#) but it can still make your code less readable and harder to reason about.

Using Automapper to improve performance of Entity Framework

Entity Framework is an ORM technology widely used in the .NET world. It’s very convenient to use and lets you forget about SQL… well, at least until you hit performance issues.

Looking at the web applications I worked on, database access usually turned out to be the first thing to improve when  optimizing application performance.

Navigation properties

The main goal of Entity Framework is to map an object graph to a relational database. Tables are mapped to classes. Relationships between tables are represented with navigation properties.

ef

The above example will be mapped to the following classes:

The highlighted lines declare navigation properties. Thanks to navigation properties, it’s very convenient to access details of Article’s Author. However, it comes at a cost. Imagine the following code in the view:

Assuming that ViewBag.Articles  is loaded with the below method, this code might turn out to be very slow.

Unfortuantely, it will fire a separate SQL query to the database server for each element in the Articles collection. This is highly suboptimal and might result in long loading times.

Lazy and eager loading

The reason behind this behaviour is the default setting of Entity Framework which tells it to load navigation properties on demand. This is called lazy loading.

One can easily overcome this problem by enabling eager loading:

Eager loading will cause EF to pre-load all Authors for all selected Articles (effectively, performing a join).

This might work for simple use cases. But imagine that Author has 50 columns and you are only interested in one of them. Or, Author is a superclass of a huge class hierarchy modelled as table-per-type. Then, the query built by EF would become unncessarly huge and it would result in transfering loads of unnecessary data.

Introducing DTO

One way to handle this situation is to introduce a new type which has all Article’s properties but additionally has some of the related Author’s properites:

Now we can perform projection in the query. We will get a much smaller query and much less data transfered over the wire:

Automapper

We improved performance, but now the code looks much worse – it involves manual mapping of properties which is in fact trivial to figure out. What’s more, we would need to change this code every time we add or remove a field in the Article class.

A library called Automapper comes to rescue. Automapper is a convention-based class mapping tool. Convention-based means that it relies on naming conventions of parameters. For example,  Author.FirstName  is automatically mapped to AuthorFirstName . Isn’t that cool?

You can find it on NuGet. Once you add it to your solution, you need to create Automapper configuration:

Here we declare that Article should be mapped to ArticleDto, meaning that every property of Article should be copied to the property of ArticleDto with the same name.

Now, we need to replace the huge manual projection with Automapper’s ProjectTo  call.

You need to add one more line:

And that’s it. You’ve just improved readability of your code and made it less fragile to changes.

Summary

Automapper is a very flexible tool. You don’t need to rely on naming convensions, you can easily declare your own mappings.

Additionally, we have just used just a specific part of Automapper – Queryble Extensions which work with ORMs. You can also use Automapper on regular collections or just on plain objects.

I believe the problem I highlighted here is just a symptom of a much broader issue of incompatibility of relational and object oriented worlds. Although Entity Framework tries to address the issue by allowing to choose between eager and lazy loading, I don’t think it is a good solution. Classes managed by EF being elements of a public API are a big problem. As a user of such interface you never know if a navigation property is loaded and whether accessing it will result in a DB query.

Therefore, I advocate the use of mapped DTOs. This approach reminds me slightly of an idea called Functional Relational Mapping adopted for example by the Slick framework for Scala. I believe it to be a great alternative to classic ORMs.

Some references:

Asynchronous programming in Scala vs C#

In one of my recent post I compared two different approaches that authors of Scala and C# chose to solve the same problem. This post is based on the same idea but the problem being solved is asynchronous programming.

What’s asynchronous programming?

Let me explain by giving you an example. If you have ever used a web framework you might have been wondering how it handles multiple concurrent requests from different users. The traditional approach is to spawn a new thread (or get one from a thread pool) for every request that comes in and release it once the request is served.

The problem with this solution is that whenever those threads perform IO operations (such as talking to a database) they simply block and wait for the operation to finish. Therefore, we end up wasting precious CPU time by allowing our threads to be blocked on IO.

Instead of blocking threads on IO operation we could use an asynchronous database API. Such API is non-blocking. However, running a database query using such an API requires you to provide a callback. Callback in this case would be a function that would be invoked once the result is available.

So, in the asynchronus model your thread serves the request, runs some computations and when it needs to call the database, it initiates the call and than switches to do some other, useful work. Some other thread will continue execution of your request when the database returns.

Asynchronous model example
Example of asynchronous processing in a web framework

Asynchronous programming in C#

The biggest pain of writing programs in the asynchronous model is the necessity of callbacks. Fortunately, in C# we have lambda functions which allow us to write callbacks with ease. However, even with lambdas we can end up with lot of nesting.

The key to asynchronous programming in C# is the Task class. Task represents a piece of work that can be either blocking or heavy on processor so it makes sense to run it asynchronously.

In the first line we create a task that fetches the Google main page. The task starts immedietely on a thread from a default, global thread pool. Therefore, the call itself is not blocking. On the second line we attach a callback which defines what should happen once the result is fetched.

As I said, it is easy to introduce nesting with callbacks. What if we wanted to visit Facebook but only if we succeeded fetching the Google page?

This code isn’t very readable. Also, if we wanted to visit more websites, we could end up with even more levels of nesting.

C# 5.0 introduced an excellent language feature that lets you write asynchronous code just as if it was synchronous: the async and await keywords. The above example can be rewritten as follows:

One caveat about async/await is that the method containing any await calls must itself be declared as async. Also, the return type of such a method must be a Task. Therefore, the asynchronous-ness always propagates upstream. This actually makes sense – otherwise you would need to synchronously wait for a task to finish at some point. Modern web frameworks such as ASP.NET MVC let you declare the methods that handle the incoming requests as asynchronous.

One more thing about C# tasks – with them executing stuff in parallel is incredibly easy.

Task.WhenAll creates a task that will be finished when all tasks from the provided array are finished.

Asynchronous programming in Scala

Let’s have a look at how Scala approaches the problem. One of the approaches to asynchronous programming is to use Futures. Future is a class that has very similiar semantics to C#’s Task. Unfortunately, there is no built-in asynchronous HTTP client in Scala, but let’s assume we’ve got one and it’s interface looks like this:

We can write code that looks very similiar to the C# example with flatMap:

Flatmap invoked on a future takes a callback that will be execute once the result of that future is available. Since that callback must return a Future itself, we must return an empty future (Future.successful) in the else branch of our if.

When fetching the Facebook page, we use map instead of flatMap because we don’t want to start another future inside the callback.

Again, the main issue with this code is that it is nested. Very similarly to how Scala handles nested null checks with Option monad, here we can again use the for-comprehension syntax to get rid of nesting!

As you might have expected, parallel processing is also supported with Futures:

An example of a web framework that supports asynchronous request handlers is Scalatra.

Conclusions

As you can see, C# and Scala approach asynchronous programming similliarly. What I find interesting here is how Scala handles callback nesting with the generic mechanism of for comprehension and C# introduces a separate language feature for that. This is exactly the same pattern as in Option monad vs null-conditional operator. To be honest, I find the async/await overall a bit more awesome – it really makes you feel as if you were writing synchronous code.

Update: as pointed out by Darren and Yann in comments, you can also do async/await in Scala thanks to this library. There is also a pending proposal to add it to the language that admits that it’s inspired by C#’s asyns/await syntax.

Conclusions after first four months of blogging

In this short post I name some random conclusions I had after the first four months of blogging. I hope this will be helpful for people who are considering starting their own programming blog (which I fully recommend to do!).

Total number of views: 16 000 

The number looks good to me although it gets interesting if we look at the distribution of views over different posts:

 

So, most of the views are due to my latest post, Scala’s Option monad versus null-conditional operator in C#. I submit most of my posts to Hacker News and this is also the main source of hits.
The conclusion here is that the title of the blog post really matters. I am yet to discover why this particular one caught attention but my suspicion is that with functional programming being a hot topic nowadays might be the reason.

Total number of posts: 10

This is much worse than what I aimed for (which is at least one post per week). The primary reason is lack of time since writing a longer piece is at least 2 hours for me.

What I plan to do about it is to do more short posts explaining solutions to some interesting problems I encounter at work or while working on side projects (such as Accessing request parameters from inside a Future in Scalatra).

My opinion on Blogger

I chose Blogger following the advice on one of other programming blogs. So far, I’m not totally happy with it and I kind of regret that I did not choose WordPress.

I once had a blog on WordPress for a while and what I liked there is that some of the traffic came from other WordPress users thanks to its Discover and Recommendations features. I thought a similiar thing will happen here with Google+ but it’s not happening at all.

Additionally, the choice of free templates is much poorer, the built-in editor is not very convenient and the statistics module is less fancy.

Update: I decided to move the blog to WordPress because of the reasons mentioned above.

Scala’s Option monad versus null-conditional operator in C#

Today I will talk about an awesome feature of C# 6.0. We will see how it can help us understand monads in Scala!

Null-conditional operator

Imagine we have a nested data model and want to call some method on a property nested deeply inside an object graph. Let’s assume that Article does not have to have an Author, the Author does not have to have an Address and the address does not have to have a City (for example this data can be missing from our database).

This is very unsafe code since we are at risk of NullReferenceException. We have to introduce some null checks in order to avoid the exception.

Yuck! So much boilerplater code to do a very simple thing. It’s really unreadable and confusing.

Fortunately, C# 6.0 introduces the null-conditional operator. The new operator denotes ?. and can be used instead of the regular . whenever it is possible that the value on the left can be null.

For example, the below piece can be read as “call ToUpper only if bob is not null; otherwise, just set bobUpper to null“.

Returning to our previous example, we can now safely write:

 

The Option type

As I explained in one of my previous posts, in Scala we avoid having null variables at all cost. However, we would still like to be able to somehow reflect the fact that a piece of data is optional. The Option[T] type can be used to explicitly mark a value as optional. For example, vale bob with type Option[String] means that bob can either hold a String value or nothing:

Therefore, we can easily model the situation from the previous example as follows:

Notice how, compared to C#, Scala forces us to explicitly declare which field is and which field is not optional.

Now, let’s look at how we could implement printing article’s author’s city in lower case:

This naive approach is not a big improvement when compared to the C# version. However, Scala lets us do this much better:

Although this version is not as short as the one with C#’s null-conditional operator, it’s important that we got rid of the boilerplate nested if statements. What remained is a much more readable piece of code. This is an example of the for-comprehension syntax together with the monadic aspect of the Option type.

The Option monad

Before I exaplain what exactly is going on in the above piece of code, let me talk more about methods of the Option type. Do you remember the map method of the List type? It took a function and applied it to every element of the list. Interestingly, Option does also have the map method. Think of Option as of a List that can have one (Some) or zero (None) elements. So, Option.map takes a function and if there is a value inside the Option, it applies the function to the value. If there is no value inside the Option, map will simply return None.

Now, can we somehow use it with our initial problem? Let’s see:

I think it looks slightly better than the nested if approach. The problem with this is that the type of cityLowerCase is Option[Option[Option[String]]]. The actual result is deeply nested. What we would prefer to have is an Option[String]. There is a method similiar to map which would give us exactly what we want – it’s called flatMap.

Option.flatMap takes a function that transforms an element inside the option to another option and returns the result of the transformation (which is a non-nested option). The equivalent for List is List.flatMap which takes a function that maps each element of the list to another list. At the end, it concatenates all of the returned lists.

The fact that Option[T] and List[T] have the flatMap means that they can be easily composed. In Scala, every type with the flatMap method is a monad! In other words, a monad is any generic type with a type parameter which can be composed with other instances of this type (using the flatMap method). Now, back to for-comprehension. The nice syntax which allows us to avoid nesting in code is actually nothing more than a syntactic sugar for flatMap and map. This:

…translates into this:

For comprehension works with any monad! Let’s look at an example with lists:

For each element in the first list we produce a list ranging from 1 to this element. At the end, we concatenate all of the resulting lists.

Conclusion

My main point here is to show that both C# and Scala introduce some language elements to deal with deep nesting. C# has null-conditional operators which deal with nesting null checks inside if statements. Scala has a much more generic mechanism which allows to avoid nesting with for-comprehension and flatMap. In the next post I will compare C#’s async keyword with Scala’s Future monad to show the similarities in how both languages approach the problem of nested code.

Accessing request parameters from inside a Future in Scalatra

Scalatra is an awesome, lightweight web framework for Scala. It’s perfect for building REST APIs. One of its less known features is support for asynchronous programming using Scala’s Futures.

By mixing in the FutureSupport trait one can easily make their servlet asynchronous. Once this trait is mixed-in into your servlet class, you can return Futures in your post and get handlers and Scalatra will automagically take care of them.

Recently I encountered a minor issue with Scalatra’s support for Futures – it is not possible to access params or request values from code inside a Future. The below code throws a NullPointerException.

Scalatra exposes access to contextual data such as the current user or request parameters via members such as params or request. These values are implemeted as DynamicVariables. Dynamic variables is Scala’s feature which allows a val to have different values in different scopes. The point is that DynamicVariable implementation is based on Java’s ThreadLocal. Therefore, when executing code in a Future you may not rely on these values since you might be on another thread!

An obvious solution to this problem is to retrieve request parameters before entering the Future:

However, this is not always a very convenient solution. I came up with the following workaround:

Firstly, we take a copy of the current request. Later, inside the Future we tell Scalatra to substitute the request dynamic variable’s value with our copy. Therefore, the call to params will use the correct request and there will be no error.

Update

As I recently learned, there is a much better way to solve this issue that is actually built into Scalatra. The way to go is using the AsyncResult  class. Our example would look like this:

AsyncResult  is an abstract class. We create an instance of anonymous type that extends it and overrides is  value. AsyncResult  takes copies of current request  and response  values when created and makes them available to code inside is .

You can find more information here.