Tuesday, December 7, 2010

Performance Counters and Scaling

To make the right decisions on scaling up or scaling down your Windows Azure instances you need to have good metrics. One of these metrics can be performance counters like CPU utilization, free memory etc. Performance counters can be added during startup of a worker or webrole or from a distant. Adding a perfcounter within the logic of the role itself is fairly easy.

public override bool OnStart()

var config = DiagnosticMonitor.GetDefaultInitialConfiguration();

// Adding CPU performance counters to the default diagnostic configuration
new PerformanceCounterConfiguration()
CounterSpecifier = @"\Processor(_Total)\% Processor Time",
//do the actual probe every 5 seconds.
SampleRate = TimeSpan.FromSeconds(5)

//transfer the gathered perfcount data to my storage account every minute
config.PerformanceCounters.ScheduledTransferPeriod = TimeSpan.FromMinutes(1);

// Start the diagnostic monitor with the modified configuration.
// DiagnosticsConnectionString contains my cloudstorageaccount settings
DiagnosticMonitor.Start("DiagnosticsConnectionString", config);
return base.OnStart();

From now on this role will gather perfcounters for ever giving you the ability to analyze the data.

How to add a performance counter from a distant?

//Create a new DeploymentDiagnosticManager for a given deployment ID
DeploymentDiagnosticManager ddm = new DeploymentDiagnosticManager(csa, this.PrivateID);

you need a storage account (where your WADPerformanceCounter table is) and the deploymentID of your deployment(multiple instances of a web and/or workerrole is possible since every instance is a running VM eventually).

//Get the role instance diagnostics manager for all instance of the a role
var ridm = ddm.GetRoleInstanceDiagnosticManagersForRole(RoleName);

//Create a performance counter for processor time
PerformanceCounterConfiguration pccCPU = new PerformanceCounterConfiguration();
pccCPU.CounterSpecifier = @"\Processor(_Total)\% Processor Time";
pccCPU.SampleRate = TimeSpan.FromSeconds(5);

//Create a performance counter for available memory
PerformanceCounterConfiguration pccMemory = new PerformanceCounterConfiguration();
pccMemory.CounterSpecifier = @"\Memory\Available Mbytes";
pccMemory.SampleRate = TimeSpan.FromSeconds(5);

//Set the new diagnostic monitor configuration for each instance of the role
foreach (var ridmN in ridm)
DiagnosticMonitorConfiguration dmc = ridmN.GetCurrentConfiguration();
//Add the new performance counters to the configuration

//Update the configuration

By applying the code above for a certain role (specified by RoleName), every instance of that role is being monitored. Both CPU and memory perf counters are sampled and written to the cloudstorageaccount provided. The data of every 5 seconds per instance is eventually written to storage.

So how make the right decisions on scaling? As you might notice your raw data in storage doesn't help you and only shows sparks. To discover trends or averages you need to apply e.g. smoothing to your data in order to get a readable graph. See the graph below to see the difference.

As you can see the "smoothed" graph is far more readable and shows you exactly how your roleinstance is performing and whether or not it needs to be scaled up/down.

Smoothing algorithms like simple or weighted moving average can help you make better decisions rather than examining the raw data from the WADPerformancecountersTable in your storage account. Implement this smoothing, have it run in a workerrole in the same datacenter and avoid massive bandwidth costs if you examine weeks or even months of performance counter data.

Good luck with it!