Database performance analysis using R

While R is a popular tool used by data scientists it is also a tool that can be used by anyone that collects any type of data. I am not an expert in R, but I have been able to use it to analyze the metrics data that we collect from servers. I find it very useful when doing a quick exploratory analysis to find trends in the data. After that, we can continue to do a deeper analysis of any specific area where we see an anomaly.

With cloud environments, it is very easy to configure servers based on the requirements needed by changing CPU, I/O and RAM capacity. If we are not sure what should be the preferred configuration, we can test multiple configurations and generate a set of metrics. We then analyze those metrics using R.

Another way is to analyze a potential change to an existing system. The key is to find the correct metrics to track. For example, we were considering adding full auditing to an existing database but wanted to know how much load it would put into the system. So, we created a sample load script and tested the load with: No auditing enabled and with auditing enabled. Then we just use R to compare the results. Below is a summary of the steps.

Metrics tracking

Use collectl (http://collectl.sourceforge.net) to track the metrics. You can use any other tool to collect metrics as long as it generates output in table format. For our example lets run this command.

Analyze data in R

Create an R script to import the data and generate plots. For our example we are interested in the I/O Requests per Second and CPU usage. We used a bar plot for I/O Requests and a line plot for CPU. We also used the ggplot2 library to generate the plots. You can find a copy of the R script in my Github repository (LINK).

Show the results

As you can see, for our sample test, adding full auditing to the database will increase the I/O by 6X and CPU by 5X approximately. This was just an example of a worst case scenario since we will not do auditing in the whole database. But, this gives you an idea on how to use R and any metrics tracking tool to measure performance on any system.