Monitoring and Alarms

Monitoring and Alarms

Overview

This doc goes over the tools and infrastructure that power Firefly’s metric collection and alarming.

Metrics Ingestion

Firefly metric publishing pipeline

  1. During execution of a request, Firefly publishes metrics as logs to the /aws/lambda/MusicFirefly-prod-graphql CloudWatch log group. This is typically done with logger.metric in the Firefly code.

  2. A subscription filter exists on the /aws/lambda/MusicFirefly-prod-graphql log group that filters to only include logs with the log level METRIC, and sends those to our Kinesis stream

  3. The kinesisLogHandler lambda ingests the records from the kinesis stream and publishes them to the relevant Timestream DB table. Currently we have 4 tables: FireFly-Clients, FireFly-Platform, FireFly-Services, and FireFly-Resolvers

  4. The Timestream DB is set up as a “Data source” in our AWS-Hosted Grafana environment, where we can perform queries against it.

Alarms/Cutting Tickets

Firefly alarms to sim tickets pipeline

  1. A Firefly operator creates the Grafana Alerts

  2. Incoming metrics trigger a Grafana Alert

  3. Grafana publishes an SNS event on the Firefly Grafana alerts topic

  4. The Grafana alerts queue listens to this topic and ingests the event

  5. A lambda ingests the events and leverages the Tickety API to create SIM tickets

FAQ

How are Grafana Alerts created?

Currently all Grafana Alerts have been created by hand by a Firefly operator. In the near future, we hope to have the proper tooling in place create alarms programatically.

Do you have a list of all Grafana Alerts defined somewhere?

Yes! We have all our targetting alarms and their creation statuses defined here: Firefly Grafana Alerts