Monitoring is key to all modern IT operations. It is the same if we are hosting all resources in a local data center or a public cloud. Looking back, we have used, for example, System Center Operations Manager for all applications and OS monitoring, then we often had one product to monitor our hardware and maybe one for our network. Even if the goal was to use one product, we often ended up with multiple ones without any connectivity between them.
In Azure, it will be a bit different from the beginning; we will already from the start use multiple services, and then connect them in the visualizing (dashboard) and reporting (workbook) layers. A significant difference compared to the local data center is that in Azure, all services and components are built to collaborate and work together. Another key difference compared with local datacentre is that in Azure, we quickly test new monitoring features, scale up and down, and immediately support the monitoring of new business services.
In this blog post, we will walk through how to get scope monitoring with Azure services. We will begin by setting the scope and expected outcome for the monitoring solution. Instead of saying we will monitor “everything,” we will create a scope based on what we need. Saying “we will monitor everything” often ends with an outdated, unnecessary complex solution that doesn’t fulfill our requirements and doesn’t give value.
Before starting setting the monitoring scope, we recommend you read the Introduction about Cloud monitoring guide in Microsoft Cloud Adoption Framework.
Setting the scope
To set the scope of monitoring, I often recommend to break it down per application, as it is applications that the business use, not hard disks or network segments. Often it is per application support case will be created, and the SLA is per business application. Once you have decided which application to focus on, ask the following questions;
- What is it that you need to monitor?
For example, let us say that Contoso has a web application, including a SQL Server database, an application server, and two web servers. Those components are what we need to monitor. All servers are running Windows Server 2012, and the database server is SQL 2012. There are two critical services on the application server and four Windows events to look for in the Event Viewer. All servers also need standard performance monitoring. We also must make sure all servers can reach each other on the network and that the web application URL is available from the Internet.
- What do you need to see/test to decide if the service is healthy?
In general, Contoso needs to see that the web site is available from the Internet. They also need to know that the application server can do SQL queries against the SQL Server.
- What data do you need to collect to present what you need?
When talking about which data to collect, there are two different ways to look at monitoring, both business and technical.
- Business Perspective is the perspective that the users of the service look at it. In this example, can users use web service from the Internet? The users don’t care if a hard disk is low on free space; they only care if they can access the web service and have a pleasant experience. For the business (or SLA) perspective, for this scenario, we only need to collect a URL check from the Internet.
- Technical Perspective is for the engineers operating and hosting the service. They care about all the small components of the service, everything that can affect the availability and performance of the service, for example, network connectivity, proactive SQL risks, server performance, events, log files, disk space, and so on.
- What are your reporting requirements?
For the Contoso web application, the reporting requirements are the number of requests on the web site and also web site availability per month. These should be delivered by e-mail to service owners at the beginning of each month, showing the previous month.
- Where do you need to see your monitoring?
For example, Contoso needs to visualize the status in a dashboard and also get notification by e-mail if something is not working. There should also be integrated into the Contoso ITSM system to generate incidents if there is a critical error in the application.
- Do you need to integrate into any other systems?
Contoso is running ServiceNow as the ITSM tool and wants to generate an incident if there are any errors. They are also using a SIEM tool to collect security events from different systems.
You should now have a clear scope of what you are going to monitor, the components in scope, how to visualize and report. The next step is to configure each component.
The following page will help you map your requirements to services in Azure.
What if we already have System Center Operations Manager (SCOM)?
If you already have System Center Operations Manager (SCOM) in place and it is working well, continue to use it, but add value with Azure services. SCOM has great capabilities, but it can be complex to consolidate the information into easy to use dashboards. Azure Monitor also provides some additional capabilities that are not covered by SCOM today, such as tracking changes across your virtual machines, or viewing the status of the updates.
The long-term vision can be to move all monitoring to native Azure services but review each monitoring requirements separately. For example, if you must collect and analyze security logs, for that Security Center and Azure Monitor is often a better solution than SCOM. If you take a structured approach to move each monitoring component separate to Azure Monitor as the capabilities match your requirements, one day it will all be in Azure. During the hybrid-monitoring, you will still fulfill all your monitoring requirements and use the best of both worlds for your monitoring solution.
Written with my friend and colleague, Anders.
Image found here, not my own, and source not found. If this image is yours, please let me know so I can remove or credit as desired.