Tableau on AWS – What Architecture?

We identify successful Tableau deployment when someone unknown to us requests their account to see the latest dashboard where someone else (also unknown) published an insightful viz. As the need for more data is constantly growing, you want your Tableau to be able to keep up with the growing demand. You should consider an infrastructure that allows you to scale as the need grows and remains cost efficient. It implies the possibility to scale both your databases and your Tableau Server simultaneously. A strong answer to such flexibility and robustness is to deploy both of your DB and your Tableau Server on AWS. A full AWS approach is powerful, scaleable and cost efficient. Let’s take a look at the different options you should consider.

1373e60e1184cc4cb0d72a196e5cb964

Almost… But for now, let’s focus on the services that will help you set up your BI Stack, a database and a Tableau Instance. We will keep things simple and assume no need for an ETL Tool. Lets look at EC2, RDS, Redshift & EMR, their advantages and disadvantages when working with Tableau.

Option 1: Tableau Server + Database on EC2.

AWS EC2 are simple virtual machines on which you can install anything you want. Let’s create a simple scenario where you install your DB on an EC2 Instance and a Tableau Server on another one.

cloudcraft-web-app-reference-architecture-5

 

This type of architecture is easy to set up and will work for most companies that begin with Tableau and a small DB. This type of deployment is fairly standard and similar to what you would see in an on-premise situation. Keep in mind that when you want to increase your DB or Tableau Server performances, you can simply put more hardware on your EC2 machines. There will be a downtime but it should be quick enough not to cause much trouble.

The good side of having 2 EC2 instances communicating together is the flexibility you have when picking your DB technology. If you are not a fan of MYSQL (like me), you can deploy any type of DB that Tableau can connect to and you are good to go. You can pick 2 different instance sizes, like having a small instance for your DB and a larger one for your Tableau Server and run most of your analytics on top of Tableau extracts. The main advantage is the speed to deployment. Within a couple of minutes you have your Tableau Server up and running associated with your DB. All of that can scale up to a reasonable limit before designing larger architecture.

The disadvantages of this kind of solution may be some work later down the line when the need to scale up or when the need for a fancier solution arises. When adding a Disaster Recovery (DR) on your DB and/or Tableau Instance, you already have up to more EC2 instances, add a load balancer and configure all of this yourself. It may quickly become more complex than you initially planned.

Solution total cost (excluding any licences):

  • Around 650 USD per month. EC2 AWS Prices here.
    • 2X r3.xlarge 32GB Ram & 4 CPU (push it to 8 cores for more comfort)
    • Go for Linux on your DB instance as they are twice as cheap compared to a windows machine.

 

Option 2: Tableau Server + DB on RDS

This architecture is quite similar to our option 1 but this time we are moving our DB to RDS. I would recommend to follow this path if you are planning on using one of the DB supported by RDS listed below:

  • Aurora
  • MySQL
  • Oracle
  • PostgreSQL
  • SQL Server
  • Maria DB

cloudcraft-web-app-reference-architecture-4
The  architecture assumes your Tableau Server querying a read replica of your DB to avoid Tableau to send queries to your production DB.

 

Let’s avoid the classical RDS vs EC2 efficiency / cost debate. I would simply argue the following points that will help your Tableau Server:

  • The read replication capabilities to take some load off your DB. This is important from a Tableau point of view as you can imagine having 1 or more read replicas of your DB and point Tableau towards those read replicates. It is very simple to put in place and allows you to move heavy work from your main DB. It is also feasible with several EC2 instances but you would make your life unnecessarily complicated.
  • Automated backups are handled for you. You can retrieve your DB at any point in time – yes at any time up to the second level over the last 30 days and for free. You can also restore a full snapshot of your DB at any point time.
  • The simplicity of managing an RDS instance over several EC2 instances (no security to manage, no install, upgrades, patches, etc).
  • If your DB fails, AWS will handle everything for you, DR is included if you use the multi-AZ option.

This architecture has some definite advantages over the EC2 exclusive model. If you can go for the DB models supported by RDS, I would recommend it.

Solution Total Cost:

  • Around 350 USD for your EC2 instance
    • 1X r3.xlarge 32GB Ram & 4 CPU (push it to 8 cores for more comfort)
  • Around 250 USD per month for your RDS instance Prices here.

 

Option 3: Tableau Server + Redshift

Redshift has been created with the mission to answer complex, large and numerous queries for Business Intelligence applications like Tableau. It is by definition what you should use in AWS if you had total freedom of architecture. Its columnar-stored model makes things fast for typical Tableau queries.

 

cloudcraft-web-app-reference-architecture-3

In the example above we have created a 6 Nodes instance on Redshift that will be available on our Tableau Server.

From my experience, Redshift performs really well with Tableau and I have seen very successful deployment of this type. You can scale up to Terabytes of data and have great performances for ad-hoc queries typically sent by Tableau users. Redshift can handle parallel queries (MPP) and you can add several nodes for additional computing power with no downtime.

Keep in mind that Redshift is available in only one AZ, so you cannot have a DR of your DB. If your Redshift fails and your dashboards are connected live to it, your analytics are down. For most deployment, that would not be too much of an issue but some companies may need their analytics running 24/7. You will need to answer the following question: Can I afford to lose my analytics for a couple of hours? If the answer it is no, then forget Redshift.

The very reasonable price versus its efficiency will probably solve most problems when coming to analytics at scale. This type of infrastructure is perfect for fast growing start-ups, existing slow deployments and for larger companies that need to scale their data warehouse quickly and easily.

  • Around 350 USD per month for the Tableau Server
    • 1 r3.xlarge 32GB Ram & 4 CPU (push it to 8 cores for more comfort)
  • Around 250 USD per month for a 1 Node deployment. My Tip: use SSD…

Be mindful that extra costs may occur when sending data back and forth to your different AWS services. If you want to dive deeper on Redshift then you should read this article.

 

Option 4 : Tableau with EMR

An EMR deployment answers a different type of question to “How to create a strong infrastructure for Tableau on AWS?”, but rather answers something like “How to handle a tremendous amount of data with Tableau?”. Queries against petabyte datasets are possible with Tableau but need the right infrastructure (you can forget about Tableau extracts here). You can consider Redshift with 128 nodes but the need for special tuning on the DB side and more control of how compressions / keys are handled, can be solved with an EMR instance. This kind of solution will require strong Hadoop skills.

I will discuss on how to deploy such solutions in an impending article as this type of architecture requires deeper explanation.

Tableau on AWS is awesome whatever architecture you choose…

Going for Tableau and your DB sitting on AWS will make it easy to manage, fast, scaleable and affordable. AWS offers different possible infrastructures and all of them will have some advantages and disadvantages. In most scenarios I would recommend to use RDS over an EC2 Instance for your DB. For more robustness and larger datasets, my recommendation would be to consider Redshift quickly. For humongous datasets (Petabytes) or HA on the Tableau Server let’s be realistic and acknowledge that your infrastructure will be more complex than previously discussed architectures.

What are your tips when using Tableau on AWS?