GraphQL Load Testing with Artillery

GraphQL Load Testing with Artillery

This document outlines the comprehensive requirements and steps for the FireFly user to perform load testing in FireFly. It includes an MCM creation and instructions on using FireFly-LoadGenArtillery to conduct the load test.

Load Testing Requirement

To prepare for load testing, user needs to:

  1. Add test scenario to FireFly-LoadGenArtillery and verify the test scenario correctness with low TPS such as executing test with 1 TPS.

  2. Evaluate the traffic projection and update FireFly Heron TPS/SLA document with the estimated traffic projection for Heron launch.

  3. Create MCM using TM-121647: Music FireFly Load Test Template. It should look like this reference MCM.

  4. Get sign-off on MCM from:

    • music-devex-apis oncall

    • FireFly load test POC (@fkhalee or @sophap)

    • Panda team for token generation. Note: We request the token from Panda only once at the start of the load test. Since the token is valid for only one hour, please make sure the test completes within that time frame.

    • Related downstream services which will be invoked by your GraphQL query.

  5. Prepare a list of user credentials (email + password) to use in load test.

  6. Execute load test.

Load Testing Setup

FireFly provides load testing tool - FireFly-LoadGenArtillery. This tool utilized open source tool - Artillery.

FireFly-LoadGenArtillery

This is the package where we defined Artillery configuration/setup to perform load test against FireFly query and mutation.

Getting started

First, you need to setup your machine using following commands:

yarn auth
yarn install

Note: Make sure you are using Node18 or greater

Package structure

config.yaml

This file contains FireFly endpoint configuration and TPS configuration which you can update based on your requirement.

Scenario

This package contains all the FireFly queries that you can use to execute load test.

credentials.csv

This file is to be updated with the credential of the Amazon Music accounts to use for this test run.

Test execution

Prior to load test, you will need to do the following:

  1. Update user credentials in credentials.csv to use for load test

  2. Update TPS and duration of the test in config.yml

- duration: 1 # run for N second
  arrivalRate: 1 # TPS

Note: For a complete list of options for your use case, please refer to the Artillery documentation.

Executing a load test from your machine

  1. Navigate to root of the project directory. For example, cd <your workspace>/firefly-loadgenartillery

  2. Run:

    region=<region such as us-east-1, us-west-2, or eu-west-1> scenario=src/scenario/<your scenario directory>/<your test scenario.yml> yarn setup
    

    For instance, region=eu-west-1 scenario=src/scenario/podcast/PodcastEpisodeDetailQuery.yaml yarn setup. This step retrieves an authentication token from the Panda service, merges the test scenario with config.yml, and publishes all required files to the /generated directory.

  3. Execute Artillery command

artillery run generated/generatedScenario.yml \
--output <report.json> \
--variables <variables_for_query> \

Option explanation

  • --output: Optional field. If you want to persist the summary of the test result which include min, max, median, P50, p70, p90, etc. The result from the test will be saved into the file specifies in --output. You can generate HTML report based on this output file using artillery report report.json. This will include visualizations of rates, latencies, etc.

  • --variables: Required field. You may also specify other variables needed to execute your request. For example, ArtistDetailQuery.yaml expects an Album ASIN using the id variable.

Documentation on the other Artillery variables can be found here.

Example

This is an example command to execute FireFly album detail query which expects album ASIN

artillery run generated/generatedScenario.yml \
--output ArtistDetailQuery-report.json \
--variables '{ "id": "Replace_With_Artist_ASIN" }'

Or below command if you want to see the network call and the response

DEBUG=http,http:response artillery run generated/generatedScenario.yml \
--output ArtistDetailQuery-report.json \
--variables '{ "id": "Replace_With_Artist_ASIN" }'

Executing a load test using Fargate

Setup

This guide describes Artillery’s support for running highly-distributed serverless load tests on AWS Fargate.

To execute tests in AWS Fargate the Artillery CLI makes use of AWS SDK to create the resources needed to run your tests . To execute the test on Fargate, you need to follow steps below:

  1. Create or use your existing AWS account

  2. Create the following IAM policy and attach it to your IAM user

  • Note that 123456789000 will need to be replaced with the id of the AWS account you’ll be using.

Click to expand
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "CreateOrGetECSRole",
      "Effect": "Allow",
      "Action": ["iam:CreateRole", "iam:GetRole", "iam:AttachRolePolicy"],
      "Resource": "arn:aws:iam::123456789000:role/artilleryio-ecs-worker-role"
    },
    {
      "Sid": "CreateECSPolicy",
      "Effect": "Allow",
      "Action": ["iam:CreatePolicy"],
      "Resource": "arn:aws:iam::123456789000:policy/artilleryio-ecs-worker-policy"
    },
    {
      "Effect": "Allow",
      "Action": ["iam:CreateServiceLinkedRole"],
      "Resource": [
        "arn:aws:iam::*:role/aws-service-role/ecs.amazonaws.com/AWSServiceRoleForECS*"
      ],
      "Condition": {
        "StringLike": {
          "iam:AWSServiceName": "ecs.amazonaws.com"
        }
      }
    },
    {
      "Effect": "Allow",
      "Action": ["iam:PassRole"],
      "Resource": ["arn:aws:iam::123456789000:role/artilleryio-ecs-worker-role"]
    },
    {
      "Sid": "SQSPermissions",
      "Effect": "Allow",
      "Action": ["sqs:*"],
      "Resource": "arn:aws:sqs:*:123456789000:artilleryio*"
    },
    {
      "Sid": "SQSListQueues",
      "Effect": "Allow",
      "Action": ["sqs:ListQueues"],
      "Resource": "*"
    },
    {
      "Sid": "ECSPermissionsGeneral",
      "Effect": "Allow",
      "Action": [
        "ecs:ListClusters",
        "ecs:CreateCluster",
        "ecs:RegisterTaskDefinition",
        "ecs:DeregisterTaskDefinition"
      ],
      "Resource": "*"
    },
    {
      "Sid": "ECSPermissionsScopedToCluster",
      "Effect": "Allow",
      "Action": ["ecs:DescribeClusters", "ecs:ListContainerInstances"],
      "Resource": "arn:aws:ecs:*:123456789000:cluster/*"
    },
    {
      "Sid": "ECSPermissionsScopedWithCondition",
      "Effect": "Allow",
      "Action": [
        "ecs:SubmitTaskStateChange",
        "ecs:DescribeTasks",
        "ecs:ListTasks",
        "ecs:ListTaskDefinitions",
        "ecs:DescribeTaskDefinition",
        "ecs:StartTask",
        "ecs:StopTask",
        "ecs:RunTask"
      ],
      "Condition": {
        "ArnEquals": {
          "ecs:cluster": "arn:aws:ecs:*:123456789000:cluster/*"
        }
      },
      "Resource": "*"
    },
    {
      "Sid": "S3Permissions",
      "Effect": "Allow",
      "Action": [
        "s3:CreateBucket",
        "s3:DeleteObject",
        "s3:GetObject",
        "s3:GetObjectAcl",
        "s3:GetObjectTagging",
        "s3:GetObjectVersion",
        "s3:PutObject",
        "s3:PutObjectAcl",
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:GetBucketLogging",
        "s3:GetBucketPolicy",
        "s3:GetBucketTagging",
        "s3:PutBucketPolicy",
        "s3:PutBucketTagging",
        "s3:PutMetricsConfiguration",
        "s3:GetLifecycleConfiguration",
        "s3:PutLifecycleConfiguration"
      ],
      "Resource": [
        "arn:aws:s3:::artilleryio-test-data-*",
        "arn:aws:s3:::artilleryio-test-data-*/*"
      ]
    },
    {
      "Sid": "LogsPermissions",
      "Effect": "Allow",
      "Action": ["logs:PutRetentionPolicy"],
      "Resource": [
        "arn:aws:logs:*:123456789000:log-group:artilleryio-log-group/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": ["secretsmanager:GetSecretValue"],
      "Resource": ["arn:aws:secretsmanager:*:123456789000:secret:artilleryio/*"]
    },
    {
      "Effect": "Allow",
      "Action": [
        "ssm:PutParameter",
        "ssm:GetParameter",
        "ssm:GetParameters",
        "ssm:DeleteParameter",
        "ssm:DescribeParameters",
        "ssm:GetParametersByPath"
      ],
      "Resource": [
        "arn:aws:ssm:us-east-1:123456789000:parameter/artilleryio/*",
        "arn:aws:ssm:us-east-2:123456789000:parameter/artilleryio/*",
        "arn:aws:ssm:us-west-1:123456789000:parameter/artilleryio/*",
        "arn:aws:ssm:us-west-2:123456789000:parameter/artilleryio/*",
        "arn:aws:ssm:ca-central-1:123456789000:parameter/artilleryio/*",
        "arn:aws:ssm:eu-west-1:123456789000:parameter/artilleryio/*",
        "arn:aws:ssm:eu-west-2:123456789000:parameter/artilleryio/*",
        "arn:aws:ssm:eu-west-3:123456789000:parameter/artilleryio/*",
        "arn:aws:ssm:eu-central-1:123456789000:parameter/artilleryio/*",
        "arn:aws:ssm:eu-north-1:123456789000:parameter/artilleryio/*",
        "arn:aws:ssm:ap-south-1:123456789000:parameter/artilleryio/*",
        "arn:aws:ssm:ap-east-1:123456789000:parameter/artilleryio/*",
        "arn:aws:ssm:ap-northeast-1:123456789000:parameter/artilleryio/*",
        "arn:aws:ssm:ap-northeast-2:123456789000:parameter/artilleryio/*",
        "arn:aws:ssm:ap-southeast-1:123456789000:parameter/artilleryio/*",
        "arn:aws:ssm:ap-southeast-2:123456789000:parameter/artilleryio/*",
        "arn:aws:ssm:me-south-1:123456789000:parameter/artilleryio/*",
        "arn:aws:ssm:sa-east-1:123456789000:parameter/artilleryio/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeRouteTables",
        "ec2:DescribeVpcs",
        "ec2:DescribeSubnets"
      ],
      "Resource": ["*"]
    }
  ]
}
  1. Go to this IAM user -> Security Credentials -> Create access key. Note down access key information

  2. In your dev laptop terminal, configure AWS credential using access key information above. Execute aws configure and provide all the required information.

  3. Execute the test.

Command

artillery run-fargate generated/generatedScenario.yml --region us-east-1 Note: Please refer to official Artillery run-fargate documentation to view the complete options.

How it works

Artillery will create a number of AWS resources behind the scenes to be able to execute your tests. All resources created by Artillery are serverless and created on-demand. There are no long-running infrastructure components involved.

Executing a load test using AWS Lambda

Setup

This guide describes Artillery’s support for running highly-distributed serverless load tests on AWS Lambda.

To execute tests in AWS Lambda the Artillery CLI makes use of AWS SDK to create the resources needed to run your tests . To execute the test on Lambda, you need to follow steps below:

  1. Create or use existing AWS account

  2. Create the following IAM policy and attach it to your IAM user

  • Note that 123456789000 will need to be replaced with the id of the AWS account you’ll be using.

Click to expand
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "CreateOrGetLambdaRole",
      "Effect": "Allow",
      "Action": [
        "iam:CreateRole",
        "iam:GetRole",
        "iam:PassRole",
        "iam:AttachRolePolicy"
      ],
      "Resource": "arn:aws:iam::123456789000:role/artilleryio-default-lambda-role-*"
    },
    {
      "Sid": "CreateLambdaPolicy",
      "Effect": "Allow",
      "Action": ["iam:CreatePolicy"],
      "Resource": "arn:aws:iam::123456789000:policy/artilleryio-lambda-policy"
    },
    {
      "Sid": "SQSPermissions",
      "Effect": "Allow",
      "Action": ["sqs:*"],
      "Resource": "arn:aws:sqs:*:123456789000:artilleryio*"
    },
    {
      // ListQueues does cannot be scoped to individual resources
      // https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonsqs.html#amazonsqs-queue
      "Sid": "SQSListQueues",
      "Effect": "Allow",
      "Action": ["sqs:ListQueues"],
      "Resource": "*"
    },
    {
      "Sid": "LambdaPermissions",
      "Effect": "Allow",
      "Action": [
        "lambda:InvokeFunction",
        "lambda:CreateFunction",
        "lambda:DeleteFunction",
        "lambda:GetFunctionConfiguration"
      ],
      "Resource": "arn:aws:lambda:*:123456789000:function:artilleryio-*"
    },
    {
      "Sid": "EcrPullImagePermissions",
      "Effect": "Allow",
      "Action": [
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage"
      ],
      "Resource": "*",
      "Condition": {
        "StringLike": {
          "aws:sourceArn": "arn:aws:lambda:*:123456789000:function:artilleryio-*"
        }
      }
    },
    {
      "Sid": "S3Permissions",
      "Effect": "Allow",
      "Action": [
        "s3:CreateBucket",
        "s3:DeleteObject",
        "s3:GetObject",
        "s3:PutObject",
        "s3:ListBucket",
        "s3:GetLifecycleConfiguration",
        "s3:PutLifecycleConfiguration"
      ],
      "Resource": [
        "arn:aws:s3:::artilleryio-test-data-*",
        "arn:aws:s3:::artilleryio-test-data-*/*"
      ]
    }
  ]
}
  1. Go to this IAM user -> Security Credentials -> Create access key. Note down access key information

  2. In your dev laptop terminal, configure AWS credential using access key information above. Execute aws configure and provide all the required information.

  3. Execute the test.

Command

artillery run-lambda generated/generatedScenario.yml --region us-east-1

Note: Please refer to official Artillery run-lambda documentation to view the complete options.

How it works

Artillery will create a number of AWS resources behind the scenes to be able to execute your tests. All resources created by Artillery are serverless and created on-demand. There are no long-running infrastructure components involved.

Limitations

  • AWS Lambda support is in preview. There are some limitations to what’s possible, and you may run into bugs. Please report any issues via GitHub issues on https://github.com/artilleryio/artillery/issues

  • Each AWS Lambda is limited to 15 minutes of running time, which means that the entire load test cannot run for longer than 15 minutes at the moment.

  • Once an AWS Lambda starts running, there is no way to stop it. Neither the AWS SDK, nor the AWS Console provide that ability. This means that once a load test starts, it will run to completion. Be mindful of this, and ramp up load on your applications gradually.

Test Termination

  • To backoff on stress/load testing we can stop/terminate lambda or fargate containers any time from the AWS account.

    • Navigate to the Fargate container using AWS Console

    • Delete the containers to terminate the test immediately in case of issues

Customer Account Pool For Load Test

Each experience owner can have different test parameters, where test scenarios need a pool of test accounts with variations. In some cases, downstream services will have requirements as well in terms of the minimum number of accounts required to simulate production traffic.

Load test query scenarios could be account agnostic (catalog queries) or have heavy customization logic based on the account parameters (recommendations, browse home) such as customer tier, locale, etc.

Experience owners can create a pool of test accounts based on their test requirements. There are a few ways to create test accounts in bulk. Load test POCs can evaluate the following test account tools to determine what works for their use-case.

  1. Tipoca Service

  2. KILO

  3. Kamino_Prod

  4. Kamino UI

  5. Tractor

Each service/tool has its pros and cons. For example, Kamino UI is meant to create permanent test accounts one at a time. Whereas others may create temporary accounts in bulk. All the services above invoke Tipoca service under the hood and apply account decorators or subscriptions as needed.

Load test owners must determine the number of test accounts in their pool based on their query behavior and downstream service requirements. Use the tools above to create account pools and plug them into the load test tool based on the instructions above.

Theoretically, FireFly-LoadGenArtillery should be able to handle a few hundred accounts, up to 1k based on the approval from Panda/MIS/Stratus depending on your load test scenario.

Since FireFly has cross-region routing enabled, load test for a specific region must use accounts which belong to the region. Using an NA account in EU will result in the requests getting routed to NA.

Test Windows

Load tests must strictly follow these test windows:

  • FE: 10:30 AM - 15:00 PM PST

  • EU: 15:30 PM - 20:00 PM PST

  • NA: 21:30 PM — 02:00 AM PST

Customer traffic in prod is at its lowest during this time period. Service owners will not approve a load test during peak traffic window as it can negatively impact prod traffic due to throttling, brownout or total blackouts.

Test Endpoints

To ensure that the traffic is routed to regional gateways accurately, use these regional gateway endpoints:

  • us-east-1.gql.music.amazon.dev

  • eu-west-1.gql.music.amazon.dev

  • us-west-2.gql.music.amazon.dev

FAQ

What are the key differences and appropriate use cases for artillery run-lambda and artillery run-fargate in load testing?

  • Use artillery run-lambda when you need serverless and execute for short-duration (up to 15 minutes) with moderate load.

  • Use artillery run-fargate for long-running, resource-intensive, or complex.

How do I disable cache lookup in FireFly during load testing?

To disable cache in FireFly for load test requests, add the header cache-control: no-cache to the request headers in the config.yml file as shown below:

config:
  target: 'https://gql.music.amazon.dev'
  phases:
    - duration: 1
      arrivalRate: 1
  processor: "util.js"
  defaults:
    headers:
      content-type: 'application/json'
      x-api-key: amzn1.application.be92e0e8c3344e7ea5f211e7f107547c
      x-amzn-test-call: true # https://sage.amazon.dev/posts/1323527?t=7
      cache-control: no-cache
  payload:
    - path: authTokens.csv
      fields:
        - 'token'

The supported options for cache-control header are:

  • no-cache – avoids checking the cache, but will store the result of any subsequent upstream service requests

  • no-store – checks the cache, but does not modify it if an upstream service request is required

  • no-cache, no-store – does not check the cache, and does not modify it with any responses from upstream services

When should I execute the test on my local machine?

You can run the test on your local machine if the TPS is small, for example, less than few hundred (each dev machine is different). Running large load tests on your local machine is not recommended due to limitations in networking and resources (CPU, memory, etc.).

When should I execute the test on Fargate?

If you need to run a larger load or if you encounter issues running the test on your local machine, you should execute the test on Fargate.