CloudWatch Insights Cookbook
CloudWatch Insights Cookbook¶
Overview¶
This document is intended to serve as a repository for particularly helpful CloudWatch insights queries, along with descriptions of what makes them useful.
Queries¶
Grafana Query Explorer Link: https://tiny.amazon.com/c1s8p1c1/ga10grafusweamazexpl
CloudWatch Query Explorer Link: https://tiny.amazon.com/6sp3n0i7/condsecua2zapicons
Requests by a customer ID¶
Log Group: /aws/lambda/MusicFirefly-prod-graphqlCore
fields
@timestamp,
payload.requestContext.authorizer.customerId as customerId,
@message |
filter @message like "GraphqlCore event" |
sort @timestamp desc |
limit 10
Request client details¶
Log Group: /aws/lambda/MusicFirefly-prod-graphqlCore
fields
@requestId as requestId,
`payload.headers.x-api-key` as clientApiKey,
payload.requestContext.authorizer.principalId as principalId,
payload.requestContext.authorizer.deviceType as deviceType,
payload.requestContext.authorizer.deviceFamily as deviceFamily,
payload.requestContext.authorizer.deviceId as deviceId,
@message |
filter @message like "GraphqlCore event" |
sort @timestamp desc |
limit 10
Notes:
I alias @requestId as requestId. This is because in the collapsed preview view, Grafana shows the first key that doesn’t start with an @ . Without this alias, the preview view would show me the principalId.
I enclose
payload.headers.x-api-keyin backticks. When a key has a hyphen in, CloudWatch insights will not properly match the key, and your result will appear null for that field.
Logs for a single request ID¶
Log Group: /aws/lambda/MusicFirefly-prod-graphql
fields
@timestamp,
@message |
filter level != "METRIC"
and @requestId = "65a0c72b-4b66-47c3-a633-396084e8587e" |
sort @timestamp desc
Notes:
I filter by level != “METRIC”. If you want to look up specific metrics. It is advised that you query our Timestream metrics store, rather than Cloudwatch Logs.
Getting auth errors for Amazon Music clients¶
Log Group: /aws/lambda/MusicFirefly-prod-graphql
fields @timestamp, @message,
payload.context.deviceFamily as DeviceFamily,
payload.context.deviceType as DeviceType,
payload.context.error as AuthError,
payload.principalId as Client
| sort @timestamp desc
| filter payload.principalId like "Unknown"
| filter level like "ALERT"
| filter message like "Returned Auth Policy"
| stats count(*) as Count by Client, DeviceFamily, DeviceType, AuthError
| limit 20
Graph of stratus timeout errors¶
Log Group: /aws/lambda/MusicFirefly-prod-graphql
fields @timestamp, @message
| sort @timestamp desc
| filter @message like "ERROR"
| filter @message like "stratus"
| stats count(payload.response.data.message) by bin(5min)
Getting latency stats¶
Our Grafana metrics are only kept alive for a couple days. In the event you need metrics farther in the past, you can query CloudWatch like so:
fields @timestamp, @message, payload.metricValue
| sort @timestamp desc
| filter level = 'METRIC'
and payload.dimensions.RequestTrace = 'Auth'
and payload.metricName = 'Duration'
| stats pct(payload.metricValue, 99) as p99,
pct(payload.metricValue, 90) as p90,
pct(payload.metricValue, 50) as p50 by bin(5m)
Getting error logs from a single service¶
Because most/all HTTP requests are sent with Axios, we can log based on the Axios response format, while including the URL being requested. Just copy the URL from one request to this service and add it in the filter line below:
fields @timestamp, @message, @logStream, @log
| filter @message like '[Axios][Error] POST https://mis-q1t-vui-na-p-tcp.iad.amazon.com'
| sort @timestamp desc
| limit 20
Getting TPS estimates for given auth types¶
Use the following query on both graphql and graphqlCore log groups (to cover cross-region calls) and divide the time range by the number of seconds. i.e. if doing 1 day range, divide the totals by (60 60 24). Make sure to check for each of the 3 regions.
fields @timestamp, @message
| filter message = 'GraphqlCore event'
| parse payload.headers.Authorization '* *' as authType, authKey
| display coalesce(authType, 'IAM') as authTypeFmt # if no auth is set, it MUST be IAM
| stats count(*) by authTypeFmt
Displaying count stats for a parsed field¶
In this case, we wanted to track any service APIs that are being called (the ‘target’) without the TransitiveAuth token being attached, in order to confirm it isn’t being dropped from any important HTTP requests. First we extract the target field, then we filter by Axios request format to get only outgoing HTTP requests, then we filter down to only these HTTP requests without the TransitiveAuth token attached, and lastly we group by target to show a table w/ counts.
fields @timestamp, @message, @logStream, @log
| parse @message '"X-Amz-Target":"*",' as target
| filter @message like '[Axios][Request] POST'
| filter @message not like 'x-amzn-transitive-authentication-token'
| stats count(*) by target
[DMA] Transitive Auth Queries¶
Number of Service calls NOT using TransitiveAuth (grouped by API)¶
fields @timestamp, @message, @logStream, @log
| parse @message '"X-Amz-Target":"*",' as target
| filter @message like '[Axios][Request] POST'
| filter @message not like 'x-amzn-transitive-authentication-token'
| stats count(*) by target
Number of requests w/ TA Token passed in from client¶
fields @timestamp, @message, @logStream, @log
| filter @message like 'GraphqlCore event'
| parse @message '"x-amzn-transitive-authentication-token": "*",' as taToken
| stats count(*) by isPresent(taToken)
Number of MIS responses w/ TA Token (and without)¶
fields @timestamp, @message, @logStream, @log
| filter @message like '[Axios][Response] POST https://mis-q1t-vui-na-p-tcp.iad.amazon.com'
| filter @message not like 'musicRequestIdentityContext'
| filter @message not like 'profileIdentityDirectedId'
| parse @message '"transitiveAuthToken":"*"' as taToken
| stats count(*) by isPresent(taToken)
Note that the tokens not present in the above query will be using placeholder token
Number of MIS responses w/ TA token (for IAM callers specifically)¶
fields @timestamp, @message, @logStream, @log
| filter @message like '[Axios][Response] POST https://mis-q1t-vui-na-p-tcp.iad.amazon.com'
| filter @message not like 'musicRequestIdentityContext'
| filter @message not like 'profileIdentityDirectedId'
| filter @message not like 'customerId'
| parse @message '"transitiveAuthToken":"*"' as taToken
| stats count(*) by isPresent(taToken)
Number of requests w/ Placeholder token for each auth scenario¶
fields @timestamp, @message, @logStream, @log
| filter @message like 'Generated Placeholder TA token for auth scenario'
| parse @message 'Generated Placeholder TA token for auth scenario *"' as authScenario
| stats count(*) by authScenario
