Tableau on AWS – What Architecture?
I recognize a successful Tableau deployment when someone I don’t know is asking to get their account to see the latest dashboard that someone else (that I don’t know either) published an insightful vizz. As the need for more data is growing, you want your Tableau be able to follow the growing demand. You should consider an infrastructure that let you scale as the need grows and that stays cost efficient. It implies the possibility to scale both your databases and your Tableau Server at the same time. A strong answer to such flexibility and robustness is to deploy both of your DB and your Tableau Server on AWS. A full AWS approach is powerful, scalable and cost efficient. Let’s have a look at the different options you should consider.
Almost… But for now, let’s focus on the services that will help you to set up your BI Stack, a database and a Tableau Instance. We will keep things simple and assume no need for an ETL Tool. Lets look at EC2, RDS, Redshift & EMR, their advantages and disadvantages when working with Tableau.
Option 1: Tableau Server + Database on EC2.
AWS EC2 are simple virtual machines on which you can install anything you want. Let’s create a simple scenario where you install your DB on an EC2 Instance and a Tableau Server on another one.
This type of architecture is easy to set up and will work for most companies that starts with Tableau and a small DB. This type of deployment are fairly standard and similar to what you would see in on-premise situation. Keep in mind that the day you will want to increase your DB or Tableau Server performances, you can simply put more hardware on your EC2 machines. There will be a downtime but it should be quick enough to not cause much trouble.
The good side of having 2 EC2 instances communicating together is the flexibility you have when picking your DB technology. If you are not a fan of MYSQL (like me), you can deploy any type of DB that Tableau can connect to and you are ready to go. You can pick 2 different instances sizes, like having a small instance for your DB and larger one for your Tableau Server and run most of your analytics on top of Tableau extracts. The main advantage is the speed to deployment. In a couple of minutes you have your Tableau Server up and running associated with your DB. All of that can scale up to a reasonable limit before designing a larger architecture.
The disadvantages of this kind of solutions may be some work later down the line when the need to scale up or the when the need for a fancier solution comes in. When adding a Disaster Recovery (DR) on your DB and/or Tableau Instance, you already have up to more EC2 instances, add a load balancer and configure all of this yourself. It may become quickly more complex than your initially planned.
Solution total cost (exclude any licences):
- Around 650 USD per month. EC2 AWS Prices here.
- 2X r3.xlarge 32GB Ram & 4 CPU (push it to 8 cores for more comfort)
- Go for Linux on your DB instance as they are twice as cheap compared to a windows machine.
Option 2: Tableau Server + DB on RDS
This architecture is quite similar to our option 1 but this time we are moving our DB to RDS. I would recommend to follow this path if you are planning on using one of the DB supported by RDS – listed below:
- SQL Server
- Maria DB
The architecture assumes your Tableau Server querying a read replica of your DB to avoid Tableau to send queries to your production DB.
Let’s avoid the classical RDS vs EC2 efficiency / cost debate. I would simply argue the following points that will help your Tableau Server:
- the read replication capabilities to take some load off your DB. This is important for a Tableau point a view as you can imagine having 1 or more read replicas of your DB and point Tableau to those read replicates. It is very simple to put in place and allows you to move heavy work from your main DB. It is also feasible with several EC2 instances but you would make your life complicated for no reasons.
- automated backups are handled for you. You can retrieve your DB at any point in time – yes at any time up to the second level over the last 30 days and for free. You can also restore a full snapshot of your DB at any point time.
- the simplicity of managing an RDS instance over an sevral EC2 instances. (no security to manage, no install, upgrades, patches etc).
- if your DB fails, AWS will handle everything for you, DR is included if you use the multi AZ option.
This architecture has some definite advantages over the EC2 exclusive model. If you can go for the DB models supported by RDS, I would recommend it.
Solution Total Cost:
- Around 350 USD for your EC2 instance
- 1X r3.xlarge 32GB Ram & 4 CPU (push it to 8 cores for more comfort)
- Around 250 USD per month for your RDS instance Prices here.
Option 3: Tableau Server + Redshift
Redshift has been created with the mission to answer complex, large and numerous queries for Business Intelligence applications like Tableau. It is by definition what you should use in AWS if you had a total freedom of architecture. Its columnar stored model makes things fast for typical Tableau queries.
In this example above we have created a 6 Nodes instance on Redshift that will be available to our Tableau Server.
From my experience, Redshift performs really well with Tableau and I have seen very successful deployment of this type. You can scale up to Terabytes of data and have great performances for ad-hoc queries typically sent by Tableau users. Redshift can handle parallel queries (MPP) and you can add several nodes for additional computing power with no downtime.
Keep in mind that Redshift is available in only one AZ, so you cannot have a DR of your DB. If your Redshift Fails and your dashboard are connected live to it, your analtyics is down.. For most deployment that would not be too much of an issue but some companies may need their analytics running 24/7. You will need to answer the following question: Can I afford to lose my analytics for a couple of hours? If the answer it is no, then forget Redshift.
The very reasonable price versus its efficiency will probably answer most problems when coming to analytics at scale. This type of infrastructure is perfect for fast growing start-ups, existing slow deployments and for larger companies that needs to scale their data warehouse quickly and easily.
- Around 350 USD per month for the Tableau Server
- 1 r3.xlarge 32GB Ram & 4 CPU (push it to 8 cores for more comfort)
- Around 250 USD per month for a 1 Node deployment. My Tip: use SSD…
Be mindful of some extra costs may occur when sending data back and forth to your different AWS services. If you want to dive deeper on Redshift you should read this article.
Option 4 : Tableau with EMR
An EMR deployment answers a different type of question than “How to create a strong infrastructure for Tableau on AWS?”, but answers something like “How to handle tremendous amount of data with Tableau?”. Queries against petabytes dataset are possible with Tableau but need the right infrastructure (you can forget about Tableau extracts here). You can consider Redshift with 128 nodes but the need for special tuning on the DB side and more control of how compressions / keys are handled can be solved with an EMR instance. This kind of solution will requires a strong Hadoop skills.
I will discuss on how to deploy such solutions in a coming article as this type of architecture needs some deeper explanation.
Tableau on AWS is awesome whatever architecture your pick…
Going for Tableau and your DB sitting on AWS will make it easy to manage, fast, scalable and affordable. AWS offers different possible infrastructures and all of them will have some advantages and disadvantages. In most scenarios I would recommend to use RDS over an EC2 Instance for your DB. For more robustness and larger datasets, my initial recommendation will be consider Redshift quickly. For humongous datasets (Petabytes) or HA on the Tableau Server let’s be realistic and accepts that your infrastructure will be more complex than the previously discussed architectures.
What are your tips when using Tableau on AWS?