Database performance analysis using R
While R is a popular tool used by data scientists it is also a tool that can be used by anyone that collects any type of data. I am not an expert in R, but I have been able to use it to analyze the metrics data that we collect from servers. I find it very useful when doing a quick exploratory analysis to find trends in the data. After that, we can continue to do a deeper analysis of any specific area where we see an anomaly.
With cloud environments, it is very easy to configure servers based on the requirements needed by changing CPU, I/O and RAM capacity. If we are not sure what should be the preferred configuration, we can test multiple configurations and generate a set of metrics. We then analyze those metrics using R.
Another way is to analyze a potential change to an existing system. The key is to find the correct metrics to track. For example, we were considering adding full auditing to an existing database but wanted to know how much load it would put into the system. So, we created a sample load script and tested the load with: No auditing enabled and with auditing enabled. Then we just use R to compare the results. Below is a summary of the steps.
Metrics tracking
Use
1 2 3 4 5 |
collectl -scdm -i 15 -R 8m -P -f /tmp/sample-test.csv -scdm: Data to be collected: c:CPU; d:Disk; m:Memory -i : Take a sample ever 15 seconds -R : Run for 8 minutes -P : Generate output in plot format |
Analyze data in R
Create an R script to import the data and generate plots. For our
Show the results
As you can see, for our sample test, adding full auditing to the database will increase the I/O by 6X and CPU by 5X approximately. This was just an example of a worst case scenario since we will not do auditing in the whole database. But, this gives you an idea on how to use R and any metrics tracking tool to measure performance on any system.