Most web applications are fundamentally similar. They get input from users and then process and store it. The end result usually shows the processed input to users in the form of an output. Such applications are usually called CRUDs (create, read, update and delete). This means that the infrastructure (the core components that are working together) that is needed to support typical product can be almost the same for a majority of different kind of startups. In this article I show what are those core components and how to design the infrastructure in such a way that it is efficient, scalable (it grows along with the product) and cost-effective.
Here are the elements that most web applications need.
This is a place where the code is executed. There are many options to choose from but typically going with one of the cloud web services providers like Microsoft (Azure VM) or Amazon (EC2) is the safe choice. They are well-supported, easy to use and when your product grows, it is easy enough to migrate to a more expensive but faster option (from the same provider).
The dependencies of the application (like background processing worker, cache database, HTTP server etc.) have to be managed. You can either do it manually (which is prone to errors and not easily scalable and upgradable) or use containers (usually in form of Docker containers), which is a bit harder at the beginning but in the long-term more cost-effective. When using containers, you can think of dependencies as black boxes with clearly defined rules of cooperation.
It may be obvious but still worth to remember that the code itself needs a programming language interpreter or virtual machine. The environment that the code needs (like Ruby or Node.js) is a major factor when choosing the web server as some technologies like Ruby are more platform agnostic (so you can run them almost everywhere) and some are usually working better on a particular platform - like .NET.
The data has to be stored somewhere. There are 2 major choices here:
Should the database be on the same machine as the application?
I’d argue that it shouldn’t. The effective configuration of the machine for a web server and for a database are different, but the major reason is that it is hard to beat the quality of a solution specifically designed for databases like Amazon RDS and Azure SQL. These machines are managed by some of the best experts in DB administration. If you can afford them, which is extremely cheap, do it now. They allow you to scale performance as needed and create backups just by changing settings in the dashboard. Using the same server to host database makes the application more prone to data loss in case of a server failure. Safety-wise, a database should always be stored on a separate machine (or service).
Should you use relational or non-relational database?
In order not to start a flame war, I’d say that the data in most applications can be organized into one or more tables (or "relations") of columns and rows, so choosing a relational database as a default is usually a safe choice. Using non-relational database should be backed by research to see if it is a good choice for the particular use case.
A place where all the calculations that take more than a second to complete are processed. You can think about it like this: without a background processing worker, any calculation that is done in the application requires someone looking at the spinning circle in their browser, waiting for the page to load while it is being calculated. You don’t want that for calculations that take more than a second.
Should I have the background processing worker on a different machine than the application?
Probably yes, eventually. But you can start by having it on the same machine because if you’re using Docker, the migration (when needed) will be easy.
Nowadays databases are fast but for the data that doesn’t have to be stored persistently, it is a good idea to store it in the memory. Random Access Memory (RAM) takes nanoseconds to read from or write to, while hard drive access speed is measured in milliseconds. Other than the persistency problem, RAM is more expensive than hard disks. That’s why only some of the data is stored in-memory. Whenever data doesn’t need to be persistent, store it in the memory. This greatly improves the speed of the application.
You can host an in-memory database like Redis on the same server that you have your application. When your application outgrows this solution, it is very easy to switch to external services like AWS ElastiCache.
Email delivery is hard. Outsourcing the problem is the most cost-effective solution. It’s not expensive and the saved time can be spent doing so many more valuable things. Not only you will get your emails delivered, but also you will get a lot of useful statistics. You will also avoid potential address/domain blacklisting in regards of spamming. Use external services, like SendGrid or Mailgun to send emails.
If you need to store a lot of static files using a dedicated storage server is a must. It will be cheaper and faster for your users. Let your web server handle business logic and storage server handle delivery of static files. Using solutions like AWS S3 or Azure Blobs will also come with the benefit of having a backup for the files. These solutions are incredibly cheap and guarantee 99.999999% SLA
Most startups need one or two web servers to handle all the traffic. But even if the web servers are fast, the data still needs to travel over the wires to the users. That’s why it is important to have a CDN which will help (almost automatically, without much configuration) to serve most of the content to a user based on their geographic location. This will also help your product survive attacks, like DDoS. Most common solutions include Cloudflare and AWS Cloudfront.
This a a very wide topic. In the old days there was a log file (or files) stored on the server and accessed only by connecting to the server. Nowadays we have so many logging tools, that you can, and should, log pretty much everything. Having multiple different machines it is worth aggregating logs to some centralized tool in order to browse through them with ease, like PaperTrail, Logmatic or ELK Stack.
The first thing is to know whether the users that are accessing a web application have any problems. A service like Rollbar can be very easily integrated into the application and catch and notify you whenever any errors occur. Adding an application analysis tool, like NewRelic, will help you to find bottlenecks within your application.
It’s crucial not only to record your application’s logs but also server’s logs. Ideally, the service that aggregates all the logs, can also do automatic analysis of the data to notify you whenever unexpected events take place.
Having a good logging infrastructure, will not only notify you about a potential or existing problem, but also greatly decrease the time needed to fix it.
Having so many components may seem complicated at first, but in reality, for an experienced person, it can be implemented within 2 days (even within couple hours). Using many external tools isn’t necessarily more expensive than trying to do all those things yourself.
Here’s a draft of the infrastructure that uses services which we think are the best for most case scenarios. You may need additional components or you may not need some of those which I have included. It is a draft of an infrastructure that will suit most products because it is not only cost-effective but also can scale when your product grows. And if your product is not a typical CRUD application or has any special needs, this is an excellent starting point to further adjust the infrastructure to suit the exact requirements of your product.
You can change particular providers for others, add additional web servers (in that case you’ll also need a load balancer and potentially an autoscaling mechanism) or an additional database. You know your infrastructure is well planned out if you can easily change a part of it without changing the whole concept.
We use the infrastructure (or very similar) that I described in most applications and it proved to be an excellent framework to start with and expand if needed. If you happen to have any questions regarding this infrastructure or you want us to create such an infrastructure for your product – don’t hesitate and give us a shout at: firstname.lastname@example.org.