Want to run Python in Tableau? Meet TabPy!
Part II: Forecasting in Tableau using TabPy
Written by Eliana Lambrou - 11-06-2018
Making Predictions by executing Python code in Tableau "on the fly"
In the previous article (Want to run Python in Tableau? Meet TabPy!), we discussed how to install TabPy, and demonstrated some simple examples of how to run Python in Tableau. Some might think that the latter is nifty, but actually they are things that could have be done in Tableau without using Python. So why should we go through the trouble of downloading and configuring the Python server in Tableau, and the hassle of writing Python scripts? Let’s dig into a more complicated but much more fun example, where we can do things that Tableau normally cannot achieve. Ready? Let’s go!
Before getting our hands on Tableau, let’s discuss what we're going to do. We have a csv file called GermanCreditData (original data set can be found here), which consists of various features of bank customers who requested loans (e.g. age, employment status, personal status, sex etc), and if they were bad or good customers (i.e. whether or not they repaid their loan). The classification is then provided in the field called Class by two integers: 1 if they were good, and 2 if they were bad. The documentation on what these numbers represent can be found here: documentation.
Then, we create an artificial dataset with some customers and their features (predictions.csv).
Next we use the GermanCreditData dataset to train a model to predict if these customers will be good or bad based on their features.
For the Machine Learning Algorithm we will use a Decision Tree Classification, an algorithm that can be found in scikit-learn library that can be imported in Python. If you are not familiar with scikit-learn, then there is no cause for alarm. Just bear in mind that it includes many powerful machine learning algorithms that can be called upon to give predictive power to your data.
Steps to running a Machine Learning Algorithm from Python in Tableau:
1. Open a new Tableau Workbook.
2.Connect to TabPy server as in step 2 in the section “How to Use TabPy in Tableau in Part I”.
Just to remind you...you have to connect to tabpy_server by running the startup.bat file that you can find in the following directory:
C:\Users\<Username>\Anaconda\envs\Tableau-Python-Server\Lib\site-packages\tabpy_server and then connect to tabpy server in Tableau by selecting Help - Settings and Performance - Manage External Service Connection and specifying Server: localhost and Port: 9004.
3. For this section we will use the csv file shown in the figure above called predictions.csv that can be downloaded from here: predictions. Connect to this data source (predictions.csv).
4. Create a calculated field (named Prediction) that reads as below:
SCRIPT_INT(" import numpy as np from sklearn.tree import DecisionTreeClassifier import pandas as pd data=pd.DataFrame.from_csv(r'C:\Users\ElianaLambrou\Documents\Projects\ PythonInTableau\GermanCreditData.csv') X=data.drop('Class',axis=1) y=data['Class'] clf = DecisionTreeClassifier(random_state=2) clf.fit(X,y) X_pred=np.transpose(np.array([_arg1,_arg2,_arg3,_arg4,_arg5,_arg6, _arg7,_arg8,_arg9,_arg10,_arg11,_arg12,_arg13, _arg14, _arg15, _arg16, _arg17, _arg18, _arg19, _arg20])) pred = clf.predict(X_pred) return pred.tolist( ) ", SUM([Status of account]),SUM([Duration(months)]),SUM([Credit History]), SUM([Purpose]),SUM([Credit amount]), SUM([Savings account/bonds]), SUM([Employment]), SUM([Installment rate]), SUM([Personal status/sex]), SUM([Debtors/Guarantors]), SUM([Residence since]), SUM([Property]), SUM([Age]), SUM([Other installment plans]), SUM([Housing]), SUM([Existing credits at bank]), SUM([Job]), SUM([People liable to provide maintenance for]), SUM([Telephone]), SUM([Foreign worker]))
Let’s have a look at the code line-by-line:
• If you remember, the good and bad customers are indicated by the numbers 1 and 2 respectively. So this calculated field will return one of the two numbers (integers),, and therefore we call the Python script using SCRIPT_INT at the beginning.
• Then we import some libraries in python that we need in order for our code to work.
- numpy helps us bring the features of the customers we want to classify as good or bad in the correct form for our predictive model
- from sklearn (scikit-learn) we import the tree branch that contains Decision Tree Classifier algorithm
- pandas is a library that helps us easily read data and manipulate it
• Next we read the csv file, and split the columns into X (which are the features of each customer) and y (which is their classification – i.e., good or bad with 1 and 2 respectively).
• Then we initialize our classification algorithm (DecisionTreeClassifier in this case), with a specific random seed so every time we run it, we get the same result. Be aware that for the purpose of this article - as it does not focus on Machine Learning - we did not optimize the algorithm with extra parameters. But for better predictions one should indeed try to optimize it.
•Next, we fit the model using clf.fit(X,y)
• Finally, we compute the predictions for each customer based on the arguments that we are giving (in total 20 arguments) that you can see at the end of the calculated field, and we return them to Tableau.
5. Drag the field that we have just created (named Prediction) into the middle of the view (it should successfully compute the predictions, and the Prediction field should be added as Text on the MarksCard).
6. Drag CustomerID and CustomerName on the rows shelf. And here it is! We have predicted if a new customer is good or bad!
7. The predictions are there, but we can make them more beautiful! Change Marks Shelf to Shape, and put Prediction on Shape. Then click on Shape and choose green tick from KPI for 1, and red cross from KPI for 2.
8. You can even make this process more instructive by putting Customer Name (or Customer ID) on the Filter Shelf. Then you can choose your Customer and see if he/she will be a good or bad customer to approve a loan for, or not.
9. Save the workbook as Predictions.twbx for future reference.
Deploying and calling python functions in Tableau
All the above features are really nice and very powerful. However, usually when data analysts or data scientists are using Machine Learning, they write long scripts. So they generally prefer to write them in a text editor, or in an interactive web application that contains live code, such as jupyter notebook. So...how can we run a script that already exists in Tableau? Firstly, we need to create the functions and deploy them, so Tableau will recognize them as endpoints on 9004 port of our localhost. To do so, we follow the procedure written below, which uses jupyter notebook:
1. Go to C:\Users\ElianaLambrou\Anaconda\envs> activate Tableau-Python-Server
2. Make sure that TabPy is listening to port 9004 (i.e. you did the step when you run startup.bat)
3. Type ipython notebook (or jupyter notebook) to activate jupyter notebook
4. Create a new jupyter notebook (or open an existing one) from the top right corner as follows:
5. Type the following code - which in principle is what we used in the previous example - in Tableau, written as a Python function named germancreditcheck.
import tabpy_client client=tabpy_client.Client('http://localhost:9004/') def germancreditcheck(_arg1,_arg2,_arg3,_arg4,_arg5,_arg6,_arg7,_arg8, _arg9,_arg10,_arg11,_arg12,_arg13, _arg14, _arg15, _arg16, _arg17, _arg18, _arg19,_arg20): import numpy as np from sklearn.tree import DecisionTreeClassifier import pandas as pd data=pd.DataFrame.from_csv(r'C:\Users\ElianaLambrou\Documents\Projects\ PythonInTableau\GermanCreditData.csv') X=data.drop('Class',axis=1) y=data['Class'] clf = DecisionTreeClassifier(random_state=2) clf.fit(X,y) X_pred=np.transpose(np.array([_arg1,_arg2,_arg3,_arg4,_arg5,_arg6, _arg7,_arg8,_arg9,_arg10,_arg11,_arg12,_arg13, _arg14, _arg15, _arg16, _arg17, _arg18,_arg19, _arg20])) pred = clf.predict(X_pred) return pred.tolist() client_deploy(‘GermanCreditCheck’, germancreditcheck, 'Classifies bad or good according to the model trained by relevant dataset',override=True)
Let’s briefly explain what we are doing in this example. In the first two lines, we are importing the tabpy_client library, which allows us to deploy endpoints based on Python functions. We are also specifying the connection address (i.e. the localhost:9004 in this case). Then we define our function (germancreditcheck), and we return a list from it. Finally, we deploy the function, i.e. we specify the name with which we will call the function (‘GermanCreditCheck’), which function this name corresponds to (germancreditcheck), and a description of the function. Then we can add a few parameters, for example here we set override to True, so every time that you deploy the function with the same name then the software will overwrite it. If Jupyter is not your cup of tea then you can run germancreditcheck in your preferred Python kernel instead.
6. If you want to remove a function from your endpoints, you can call client.remove(‘<name of function when deployed>')
7. Type http://localhost:9004/endpoints on your browser to see all functions that you have deployed. If it is the first time that you are using TabPy in this way, you should only see “GermanCreditCheck”
8. To run any function that you have previously deployed, open Tableau (for comparison we opened the previously saved workbook - Predictions.twbx) and create a calculated field (Prediction 2) that reads as following:
SCRIPT_INT(" return tabpy.query('GermanCreditCheck',_arg1,_arg2,_arg3,_arg4,_arg5, _arg6,_arg7,_arg8,_arg9,_arg10,_arg11,_arg12,_arg13, _arg14, _arg15, _arg16,_arg17, _arg18, _arg19, _arg20)['response'] ", SUM([Status of account]),SUM([Duration(months)]),SUM([Credit History]), SUM([Purpose]),SUM([Credit amount]), SUM([Savings account/bonds]), SUM([Employment]), SUM([Installment rate]), SUM([Personal status/sex]), SUM([Debtors/Guarantors]), SUM([Residence since]), SUM([Property]), SUM([Age]), SUM([Other installment plans]), SUM([Housing]), SUM([Existing credits at bank]), SUM([Job]), SUM([People liable to provide maintenance for]), SUM([Telephone]), SUM([Foreign worker]))
All you do at this point is send a query to tabpy to return the value of the function GermanCreditCheck (which exists in the endpoints of your localhost) with the arguments given.
9. Drag Prediction 2 on Columns shelf. And here it is! You should see that the predictions match with what we had before! Ticks appear under 1 and crosses under 2!
Final Thoughts
With this sequel article, our guide on how to integrate Python in Tableau comes to an end. We’ve showed you how to install Python, how to run simple Python functions through Tableau, and how to deploy Python functions to call them via Tableau. Most importantly, via these, we demonstrated how forecasting in Tableau can be empowered via applied Machine Learning using Python as a platform!
If you own a company and you are currently using Tableau for your data analysis, or you are a data analyst who uses Tableau to meet the needs of your clients, please take a few minutes to try out TabPy! You can easily run powerful models in Python, predict the future, and take your business to the next level!
.