Datadog: Deepdive on a powerful Instrumentation Tool

Dog

Overview of Datadog

Following my recent post about application instrumentation I was interested to share a deeper dive on Datadog which I’ve now got quite a few years experience with. It’s been interesting to me to observe how the offering has expanded over this time and how our use of it crossed a tipping point from promising to extremely useful.

Datadog Logo

Some history

Datadog was founded in 2010 and the idea came out of the personal needs of the founders (Olivier Pomel and Alexis Lê-Quôc) working in the tech industry. In the early days Datadog focused on infrastructure monitoring, providing tools to track server performance, network traffic, and other key metrics.   Over time Datadog expanded its offerings to include log management, security monitoring, and application performance monitoring (APM) cability. I can’t say how much of this expansion of the product was internally built but certainly they focussed on expansion through acquisition and appear to have been extremely successful with that strategy. Salesforce: please take note on how to do product integration.

Product Offering

The product suite is now extremely expansive across:

  • Observability
  • Security
  • Software Delivery
  • Other emerging domains (like AI)

Datadog Product Offering Observability

Why this works

Some of what can be done in Datadog can also be done directly in the CSP (AWS, Azure etc) but the rationale for a platform based approach with a polished UX, and the ability to keep users completely away from the CSP is great for security and governance purposes.

Most importantly though the feature that works for our teams is that the service catalogue in Datadog allows you to connect services with service owners, and this solves the ‘getting the information to the right audience’ piece. Combined with the ability to push notifications via Slack to specific channels and tickets directly into those team’s backlogs means that a lot of friction is removed from the system.

Cloud Security

Having had practical experience with CrowdStrike - which I’m happy to believe is best of breed (as determined by chief arbiter Gartner - I have observed the challenges of translating the insights into actions and resolutions. Datadog may not be as strong on the detection side, but I believe their product offering to be considerably more effective.

In particular I very much appreciated the visual overview of the product overview which presents as a dynamic dashboard:

Datadog security

The visual is laid out from top to bottom following the typical software development lifecycle from code, to pipeline, running application, and finally instrumentation. At each step the appropriate interface points for security and risk control are presented.

The brilliance of this is it’s educating on what each of these controls are, and the relevant risks at each level at the current point in time. It’s also presumably an excellent way to upsell customers on more capabilities they could be directing towards Datadog. I went into some detail explaining these controls in a previous article but actually think this visual goes a long way to providing a clear overview.