SolarWinds NPM 11.0 Review

When notified that a new version of anything is being released, I am always excited to see what new ideas and technology are being developed and how easily they are able to be implemented into an environment. Today BriComp Computers, LLC is utilizing SolarWinds NPM, NTA, and SAM ever trying to build the perfect monitoring solution for UC solutions. While that quest is still in progress I was hopeful NPM 11.0 would assist greatly in the task.

The new product has been designed to be app-centric, providing what I would consider more of a System Center Operations Manager (SCOM) view into the network layered on top of the existing exceptional network foundation present in previous versions. I know what you are thinking – with Network traffic Analyzer (NTA) and Server & Application Monitor (SAM) I can get nearly everything if not everything I need and be comparable to SCOM. However, with the new deep packet inspection technology that SolarWinds is utilizing and the pre-canned application configurations, NPM 11.0 with the added Packet Analyzer (and do not forget NTA and SAM) the information is designed to be more complete and automatically categorized.

The Quality of Experience data is broken into three categories for all data captured – Category, Risk level, and Productivity Rating. An example, out-of-the-box, would be Lync Media, defined as Streaming Media, Possible Misuse, and All Business. The categories are designed to be a nice view of what is on the network providing immediate information at the wire-level.

In SolarWinds words:

“With the addition of the new out-of-the-box DPI and analysis sensors, SolarWinds NPM provides a comprehensive view of network fault, performance, availability, traffic, and latency allowing network engineers to more effectively IDENTIFY, PRIORITIZE AND RESOLVE network issues before they impact application performance, end-users and the business by:

  • Continuously monitoring packets across the wire
  • Inspecting, identifying and classifying application traffic for over 1,200 applications such as Skype, YouTube and Microsoft Lync
  • Displaying network and application response time in easy-view charts and graphs
  • Complementing the power of flow-based technologies like NetFlow, sFlow® and J-Flow with DPI technology”

The explicit call out of Microsoft Lync excited me and raised the bar on expectations.

First Impressions

The initial install and configuration of NMP was fairly straight forward. The installer looked and felt like previous versions so there were no surprises. The installation was completed in BRICOMPLABS in an environment configured with Lync 2013, Exchange 2010, SQL 2012, TMG, and a few Windows clients. The base OS for all servers other than TMG were Server 2012, fully patched and ready to go.

The application server NPM was installed onto was fully ‘bare’ with nothing other than the OS and OS patches installed. When installing NPM it has IIS and .NET 3.5 prerequisites. However, if the pre-reqs are not installed, the installer is kind enough to inform you and install them for you*.

Once the install and the initial discovery is complete, looking at the home screen a tab for Quality of Experience is displayed. Clicking the tab brings you to the section as well as a description of next steps. The QoE Applications and Sensors process appeared straight forward; however, all of my attempted push installations of the sensors failed to all of my servers.

Going a bit further into the architecture, there is an agent that gets installed on the selected servers (that’s right – not agentless) and a ‘sensor’ which is a fancy term for WinPcap and a few Visual C++ Redistributable files (if they are not already present). Both objects are installed (pushed) when selecting under QoE Packet Analysis Sensors the specific Servers. However, as mentioned all of my installs were failing with the error “Agent deployment failed. Installer package has invalid signature.”

Deployment Error

I manually downloaded the agent and thought I would install it manually (an agent option especially if you plan to mass deploy to your servers using a third-party system) to see what the local error was. Upon installing the agent I ran into no issues – a little annoying I must say. Installing the agent I specified the polling server (BCL2010MON), login (admin no password as it is the default), and it worked. Looking further at the installer MSI I decided to verify the digit signature details and found an error stating the signature could not be verified.

Digital Signature Error

This was a bit puzzling knowing that the package should have been signed with a public certificate, so I continued by inspecting the certificate. The certificate itself was issued by VeriSign (that’s a good thing) and the dates were valid (another good thing) but further investigation found the intermediate and root certificates not on the server and thus the certificate untrusted. The certificate used was not a standard certificate found on the 2012 servers and required me to manually install them on the servers I wanted monitored**.

Certificate Error

However, once the certificates were installed the agents deployed as expected. The install time was nothing major – maybe a few minutes total with multiple machines running simultaneously. Checking on the servers we see the last four installed items are the agent and the ‘sensor’. Interestingly, I was not able to figure out easily how to manually install the ‘sensor’ when the agent was installed manually. I also did not find a great way to uninstall the agent and sensor globally. Yes, I could (and did) write a script to uninstall the agents (attached below) but that should be done from the console IMHO.

Installed Applications

The initial observed impact on the NPM server and those that had the agent were mixed. The NPM server CPU cycles were 50-80% greater than my 10.7 NPM server which has NPM, NTA and SAM installed. The servers that had agents and sensors deployed varied. On average, the agent added about 2-3% CPU impact when monitoring LYNC, sometimes more and I am not sure what the variance were (as reported by NPM). Yet interestingly, the agent/sensor did not indicate any impact in NPM on the polling server – odd***.

Agent CPU Resources

The list of agents provided included expected applications such as Active Directory, SQL, Exchange, Lync, Citrix, SYSLOG, etc. and even ones that I was not expecting such as Facebook and Evony. Not that they are blocking any of these apps or protocols, but the actual network impact could perceivably be better understood than just simply HTTP/HTTPS traffic. Speaking of HTTP, custom HTTP applications can be created filtering on the URL; you could not however edit existing applications or even see what they were looking at.

Searching for Applications

*On a base 2012 R2 server ,if .NET 3.5 is not installed, the installer will fail adding it as it needs to be installed with Server Manager and the Windows DVD.

**2012 R2 Servers did not face this same issue.

***When testing in a semi-production environment the CPU Utilization % spiked on the Lync servers to nearly 12% - huge impact to a UC environment

The Good

The installation process and getting the base of the system running was great – it installed without a hitch onto the servers and added the prerequisite software for me flawlessly. The number of applications to choose from is impressive and should cover the majority of mainstream and even a large portion of what I would consider non-mainstream software. I would assume this list will only grow. Installing the software as a network agent (rather than a server agent as I did) should allow you to sniff all traffic - not just application traffic running on the server - which I am sure would be required for more accurate data.

The Bad

The installation snag with the agent was a big deal and could potentially be a HUGE deployment barrier for companies that manage/monitor access of their infrastructure. According to Microsoft critical/urgent updates all were deployed to these servers yet they did not have the required root certificates to talk. While not necessarily an NPM failure the fact is the product does not install in a very simplistic environment without manual intervention.

Once the agents are installed the removal process is just as bad. Support responded when I asked about the uninstall process that it must be done from the Agent Management under settings. While this makes sense, and I had found this location on my own, the process still fails to uninstall anything and even warns that fact. If an agent is installed in the Agent Management be warned I did not find an option to add the sensor functionality without removing the agent and then pushing via the sensor install.

Agent Uninstall Warning

The monitors I added for Lync unfortunately were a hit/miss in the information gathered. The application monitors for Lync included a rollup traffic monitor simply named Lync, Lync Audio, Lync Control, Lync Media, Lync Share, and Lync Video. I also included Microsoft SQL to gather local SQL data. Unfortunately, the data the monitors gathered was both inaccurate and incomplete. The Lync control application is defined as being all SIP messages, the underlining code of Lync itself. Yet my monitors never showed any traffic when deployed to the Lync servers – odd. A Conference Call also showed no data in the Lync Audio although it was logged in Lync Media. Desktop Sharing showed up in Lync Media but video calls did not get captured in Lync Video. To make things worse, once unique port ranges were applied to the Lync environment all of the monitors went dead capturing no information. Using NTA and defining the ports in my environment as well as source servers IPs, I am able to get much more accurate information. The QoE Applications in NPM 11.0 are not configurable and what they are watching is a mystery, so be warned.

Cost is always a "gotcha" with any IT project. While actual costs always vary, the stated cost for NPM 11.0 is $2,675 USD which includes once network packet analysis sensor and 10 server packet analysis sensors. Looking to monitor more? – I am sure there is a SKU for that.

Finally, the resource impact. One of the great things about NPM has been its ability to remain agentless and still gather all the information it does. While the use of agents are used everywhere (including SCOM), the current agents seem unrefined at their current state.

Summary

In summary the application works as well as 10.7 for the features found in 10.7. It is peculiar that the hardware impact of 11.0 was greater for the same installation as compared to 10.7, but I am sure that is a simple update forthcoming. However, once the agents are deployed there will be additional server CPU cycles taken.

The product did not capture the Lync data I was looking for accurately and when support was asked they mentioned encrypted traffic may be an issue – well all Lync traffic is encrypted so there is that. :) The Microsoft SQL monitor I added to the Lync server did not capture any information so I am not sure what data I should be seeing there. My guess is there are a bunch of tweaks and updates that are going to be required prior to the application working as expected. NPM, NTA, and SAM working together however have been gathering the majority of the information you really need and all without localized agents. Bottom line, make sure you evaluate the product in your environment and ideally in a LAB with identical hardware, software, and versions a production so you can gauge how well the product works for you.

Uninstall Script

Save as a CMD file and execute locally to remove the agent where desired.