Tuesday, December 7, 2010

Performance Counters and Scaling

To make the right decisions on scaling up or scaling down your Windows Azure instances you need to have good metrics. One of these metrics can be performance counters like CPU utilization, free memory etc. Performance counters can be added during startup of a worker or webrole or from a distant. Adding a perfcounter within the logic of the role itself is fairly easy.

public override bool OnStart()
{

var config = DiagnosticMonitor.GetDefaultInitialConfiguration();


// Adding CPU performance counters to the default diagnostic configuration
config.PerformanceCounters.DataSources.Add(
new PerformanceCounterConfiguration()
{
CounterSpecifier = @"\Processor(_Total)\% Processor Time",
//do the actual probe every 5 seconds.
SampleRate = TimeSpan.FromSeconds(5)
});

//transfer the gathered perfcount data to my storage account every minute
config.PerformanceCounters.ScheduledTransferPeriod = TimeSpan.FromMinutes(1);

// Start the diagnostic monitor with the modified configuration.
// DiagnosticsConnectionString contains my cloudstorageaccount settings
DiagnosticMonitor.Start("DiagnosticsConnectionString", config);
return base.OnStart();
}

From now on this role will gather perfcounters for ever giving you the ability to analyze the data.

How to add a performance counter from a distant?

//Create a new DeploymentDiagnosticManager for a given deployment ID
DeploymentDiagnosticManager ddm = new DeploymentDiagnosticManager(csa, this.PrivateID);

you need a storage account (where your WADPerformanceCounter table is) and the deploymentID of your deployment(multiple instances of a web and/or workerrole is possible since every instance is a running VM eventually).

//Get the role instance diagnostics manager for all instance of the a role
var ridm = ddm.GetRoleInstanceDiagnosticManagersForRole(RoleName);


//Create a performance counter for processor time
PerformanceCounterConfiguration pccCPU = new PerformanceCounterConfiguration();
pccCPU.CounterSpecifier = @"\Processor(_Total)\% Processor Time";
pccCPU.SampleRate = TimeSpan.FromSeconds(5);

//Create a performance counter for available memory
PerformanceCounterConfiguration pccMemory = new PerformanceCounterConfiguration();
pccMemory.CounterSpecifier = @"\Memory\Available Mbytes";
pccMemory.SampleRate = TimeSpan.FromSeconds(5);

//Set the new diagnostic monitor configuration for each instance of the role
foreach (var ridmN in ridm)
{
DiagnosticMonitorConfiguration dmc = ridmN.GetCurrentConfiguration();
//Add the new performance counters to the configuration
dmc.PerformanceCounters.DataSources.Add(pccCPU);
dmc.PerformanceCounters.DataSources.Add(pccMemory);

//Update the configuration
ridmN.SetCurrentConfiguration(dmc);
}

By applying the code above for a certain role (specified by RoleName), every instance of that role is being monitored. Both CPU and memory perf counters are sampled and written to the cloudstorageaccount provided. The data of every 5 seconds per instance is eventually written to storage.

So how make the right decisions on scaling? As you might notice your raw data in storage doesn't help you and only shows sparks. To discover trends or averages you need to apply e.g. smoothing to your data in order to get a readable graph. See the graph below to see the difference.




As you can see the "smoothed" graph is far more readable and shows you exactly how your roleinstance is performing and whether or not it needs to be scaled up/down.

Smoothing algorithms like simple or weighted moving average can help you make better decisions rather than examining the raw data from the WADPerformancecountersTable in your storage account. Implement this smoothing, have it run in a workerrole in the same datacenter and avoid massive bandwidth costs if you examine weeks or even months of performance counter data.

Good luck with it!

Things to consider when migrating to Azure

Migrating existing .NET apps to Windows Azure is usually not much of a problem. It's easy to get it run, it's harder to have it scale. It's about rewriting and re-architecturing your application to fully benefit of the concepts in Azure. Storage (table, blobs, queues), SQL Azure, appfabric etc. Some pitfalls:

- session state: to be scalable store your session data in Table Storage (provider for this is TableStorageSessionProvider and example can be found at ). It's easy, it's scalable and it's cheap and it's always there!
- Separate frontend logic with background tasks and use queues to loosely couple it. Keeps your application scalable and enables you to scale web and worker roles when needed.
- where is my active directory? When moving to Azure and your app is e.g. an intranet application you don't want users to have another login. To tackle this there are two options. First, use Active Directory Federation or second, use the latest Azure Connect. Azure Connect actually enhances your own datacenter by adding Azure nodes to it. Azure Connect makes it easier to migrate because you can setup connectivity between your onpremise stuff and Windows Azure assets.
- from SQL to SQL. Migrating 100% onpremise SQL Server to SQL Azure only needs you to change the connectionstring and make sure you don't use Windows Authentication since Azure won't support it. The more difficult question would be. Do i actually need SQL Azure since it's rather expensive compared to Table or Blob storage. Maybe even a lightweight DB WebEdition will suffice and additional storage possibilities can complement your storage needs.



Part of migrating your application is to change the way you deploy your app (from onpremise to Azure). Using the Windows Service Management API is easy and straightforward and can help you incorporate deploymentlogic into your daily build process. Write your own buildtasks to deploy your app to Staging and have your tests run there, get the results and suspend and delete it again to safe money. Make sure you run your test for factors of an hour since you are charged by the hour.

Patterns & Practices published an excellent PDF around migrating your applications to the cloud.

Will blog some more around migration issues.