Question Details

No question body available.

Tags

logging metrics monitoring tracing observability

Answers (1)

Accepted Answer Available
Accepted Answer
March 18, 2025 Score: 5 Rep: 66 Quality: High Completeness: 50%

Based on my experience, the answer is the usual, it depends. Even given the specific scenario you described, it depends on the exact scope and what you want to monitor and troubleshoot. Starting to monitor "everything" is quite demanding.

  • Do you want to know how frequently such events occur? Let's go with metrics.
  • Do you want to know the exact reason such events couldn't be processed? Traces enriched with custom tags.
  • Do you want to add domain information? Structured logs. Information related to what is going on with the transport of your data is available out of the box with traces, adding logs for this kind of info is expensive for costs, resources consumed, and maintenance.

Sampling is indeed an issue, not of the tool, but of how much you invest in it. With AppInsight and Datadog you can just pay more and you will not encounter this issue. But most of the time, it is just better to reduce the amount of data stored, just save the telemetry that you actually need. Still, selecting data to save could be hard depending on the system you are working on, an alternative could be not relying on external products but having your own monitoring platform. Prometheus, Grafana, Tempo, Loki, Elastic, Kibana, Logstash. I would avoid custom solutions with generic tools, or I would use it just if I don't plan to invest/expand it. Somewhere you have to invest time, money, resources.

Once you define what and how you need to monitor your flows, all the rest will follow. And, in my personal opinion, start small. Just with metrics or traces. Once people will start using the monitoring platform, more requests will come. Just like a product with customer requests, it is a flow feature>user>feedback, don't expect it to be a time-boxed activity, it is a constant process.