OP2
OP2¶
Executive Summary¶
Our GraphQL service - FireFly functions as Amazon Music’s primary general purpose API aggregator, serving 1P, 2P, and 3P clients. It provides a consistent unified data model across internal Music services, achieves fast roundtrip resolution via caching strategies, and implements built-in optimization on queries, by fetching only the specific data requested, reducing unnecessary data transfer. The 2025 roadmap focuses on enhancing developer experience, improving performance, and expanding service capabilities to meet evolving business needs. Our investments are strategically allocated across building core capabilities/ primitives (36.5%), improving service performance, stability & reliability (37.8%), developing operational excellence (14.5%), and adhering to non-negotiable compliance and legal requirements (11.2%).
2024 Recap¶
In 2024 Firefly contributed significantly to various key projects across Amazon Music by building core capabilities such as adding new entities support for Audiobooks, Fan Groups, Insights, Subscription, Merch. etc., supporting key functionalities around live session chat, anonymous authentication etc. and improving overall stability & performance through cache optimizations, better error handling, pipeline improvements and intelligent Service timeout configurations. FireFly currently has integrations with 60+ data providers supporting 450+ distinct fields for query & mutations and supports 35+ internal clients including DragonFly, Skyfire, native module apps, etc. Overall, with over ~110 CDK+ commits to infrastructure and ~400+ to core, Firefly was able to complete 28+ schema updates as well as address 270+ external asks with a success rate of over 80% in 2024. At the same time, we made several improvements to Firefly’s operational efficiency, flexibility and infra upgrades by moving to ECS, improving ElastiCache cluster resiliency, reducing rollback time to 20 minutes or less, and speeding up deployment with a higher compute Gitlab runner fleet.
Developer Experience Q4 Survey feedback¶
In January 2025, CXI’s Developer Experience team conducted a comprehensive survey to assess our development ecosystem and gather feedback on specific Infrastructure products, including FireFly. The survey revealed a CSAT score of 4.06 (on a 7-point scale) for FireFly, with 20% of respondents (n=50) expressing slight or strong satisfaction. However, 60% of users reported requiring moderate to extensive support while using the service. Key challenges identified included insufficient documentation on concepts, tenets, guidelines, best practices, and engagement models (52%), prolonged away team code review timelines (52%), and difficulties in launching short-term experimental CX without undergoing the full review, schema design, and implementation process (50%). Developers suggested improvements such as better observability with seamless end-to-end trace profiling (60%), unified documentation paired with a developer console and schema exploration tools (50%), and simplified schema experimentation with lower barriers (44%). These insights have significantly informed our 2025 roadmap, guiding our focus on addressing pain points and implementing suggested improvements. For a complete list of identified issues and proposed areas of improvements, please refer to Appendix D.
Vision & Strategy¶
Our three-year vision aims to establish FireFly as the cornerstone API aggregator in Amazon’s audio streaming ecosystem. By 2025, we will execute our strategy across the four key themes outlined in the executive summary. We’ll focus on significantly reducing query response times and improving uptime (99.99%) through infrastructure migration to ECS and advanced caching mechanisms. Our API ecosystem will expand to include real-time streaming capabilities and integration with numerous new downstream services. We’ll drive increased developer adoption by enhancing documentation, tools, and simplifying schema experimentation. Operational excellence will be reinforced through proactive infrastructure upgrades and advanced security implementations. This comprehensive approach addresses immediate performance and reliability concerns while setting the foundation for our long-term goal of positioning our GraphQL service as the universal data access layer and industry benchmark for large-scale audio streaming platforms.
Tenets (unless you know better ones)¶
The following tenets serve as guiding principles for FireFly, shaping our decision-making process and ensuring alignment with our overall vision and strategy as we evolve and improve our API platform: -
Developer-First: Craft an intuitive, self-documenting GraphQL schema that accelerates product innovation for all customers.
Multi-Use Data Priority: Consolidate multi-feature/ use-case data while judiciously incorporating essential single-consumer fields.
Schema Stability: Embrace proven, long-term attributes to maintain a clean and reliable schema.
Ownership and Governance: The FireFly team stewards the schema, ensuring backward compatibility, proper versioning, and future-ready composability.
Data Integrity: Prioritize direct data access, replicating only when performance or reliability gains are substantial.
Enterprise-Grade: Uphold Tier-1 reliability and security standards with strict SLAs, comprehensive monitoring, and robust access controls.
Merit-Driven Adoption: Offer a compelling, paved-path solution while respecting customer choice in technology decisions.
2025 Investment Pillars and Initiatives¶
1. Core Capabilities (36.5% resource allocation)¶
As of 2024, FireFly has developed a data graph that covers essential Music, Podcast and Audiobook entities and their relationships. Our APIs provide access to Music and Podcasts catalog data, along with primitives reflecting users’ activities and taste signals such as like/follow status and listening histories. We also offer Playback, Search, and Recommendation APIs for better accessing these primitives. We also added support for Audiobook as an entity as part of project Montana and currently support a number of use cases around audiobook discovery and recommendations.
In 2025, we aim to expand our core capabilities, enhancing the service’s versatility and power. This theme focuses on developing new primitives and extending existing ones to meet evolving business needs and technological advancements. Few of the key initiatives under this theme include: -
1.1. Onboarding new ‘entities’ and expanding functionality for existing entities¶
FireFly would be adding support for ‘concerts’ as a new entity that will enable experience teams to build fandom forward experiences such as following a concert/ live events, concert recommendations, ticket purchase etc. To support ‘Blackbolt’ - an innovative AI-driven audio App, FireFly would be adding ‘collections’ as a new entity to power its CX. We will contribute towards expanding the supported functionality around ‘fangroups’ which is a strong pillar behind Amazon Music’s fandom strategy (2025 S-Team goal) by enabling follow, block profile, UGC ban status, attachment support in fan group message and supporting thumbnail/ banner images. Further, in order to power year round insight CX for customers we would augment the ‘InsightsHub’ API as part of FireFly. Most of these capabilities would be developed using the FireFly away team contribution model (FAQ 5) with the core FF team supporting with schema reviews, feedbacks and MR reviews to maintain the scalable graph architecture.
1.2. Subscriptions and real-time updates¶
We will implement GraphQL subscriptions to enable real-time data updates for our clients. This feature allows clients to subscribe to specific events or data changes, receiving instant updates without polling. Subscriptions will improve our service with real-time data synchronization, reduced network overhead, and improved user experience for live updates unlocking multiple use cases around Casting, collaborative playlists, live events updates and several Fandom initiatives.
1.3. Customer benefits management¶
As Amazon Music has evolved, the tier-based customer’s benefit-vending logic has become increasingly complex with the addition of multiple use cases where the benefits are dependent on customer context such as profile (project Montana), territory (project Geet), authentication status (project Casper), device (project Canary) etc. This has also proliferated into FireFly which has to maintain multiple hard-coded logics to vend benefit information to clients. Stratus, owned by Maple Identity org is working on evolving the benefits platform (proposal) to support ever-evolving use cases and we are collaborating closely with them to model it in FireFly that not only unlocks multiple use cases for our customers but also simplifies our tech stack resulting in better performance & lower cost. We are also working closely with the FMPM team to consolidate the ‘dynamic entitlement’ benefits currently handled by the MDEX (MusicDynamicExperience) service owned by them.
1.4. Expanding Audiobook support¶
In 2025, in order to strive towards Amazon Music’s “All Audio” goal pursuit, we plan to expand on the currently supported APIs for Audiobooks by including support for recommendations based on category, listening history, trends, new releases and exclusives. We are also exploring integrating the ‘Playback’ APIs for audiobooks including retrieving the licenses, manifests, and playback assets. Although this is currently blocked on the Audible team to share the relevant API details with FireFly to onboard. (Refer FAQ 8). These additional capabilities would power multiple use cases across our 1P,2P & 3P clients.
1.5. LLM based FireFly Assistant¶
Lack of unified documentation, onboarding support and schema exploration tools was reported to be one of the biggest areas of improvement for FireFly during the Q4’24 DevEx survey. Developers reported spending longer time onboarding onto FireFly as well as struggling with the poor performance of their production GraphQL queries. In order to address this, in 2024 we developed FireFly Assistant - an LLM-based assistant (runner-up idea during the Q4’24 MCX hackathon) serving the dual purpose of an interactive tutor for developers learning GraphQL as well as assistant aiding experience developers optimize their queries for better performance. This tool analyzes queries in real-time, providing suggestions and explanations tailored to the problem space. We would expand this tool in 2025 by integrating it in developer’s IDE to provide contextual suggestions.
2. Performance, Stability and Reliability (37.5% resource allocation)¶
Our GraphQL API’s performance has been a significant concern for our customers. While much of the latency is attributed to downstream services and sequential request chaining, we are committed to addressing this issue from the platform perspective. We will implement a programmatic approach with these downstream systems to optimize for latency along with making improvements to FireFly Infrastructure. Our focus in 2025 is on five key initiatives: -
2.1. Observability Improvements¶
We will develop a query-specific latency dashboard and include complete server-side latency measurements in addition to query duration that will provide better insights behind individual query’s performance. We will also add metrics and alarms for each integration enabling better understanding of performance bottlenecks resulting in faster issue resolution.
2.2. GraphQL Core Infrastructure Migration¶
We would complete the migration of our GraphQL core infrastructure to Amazon ECS (Elastic Container Service) as part of project MetalFly - Phase 2 by Q2’25. This migration is expected to improve performance and avoid the issues seen with AWS Lambda related to cold start and connections to Redis cluster.
2.3. Request Handling and Resource Management Enhancements¶
We will implement batching support, introduce service & API specific timeouts and optimize data loaders for improved performance. We will also integrate with BMC (Beyond Music Catalog) to fetch podcast and audiobooks specific content. These optimizations aim to reduce overall response times and manage resource allocation more effectively.
2.4. Caching Improvements¶
We will apply aggressive caching strategies along with “edge caching” capabilities to speed up data retrieval and significantly reduce response times for frequently requested data. We will also explore pre-caching the Catalog and rely on the Catalog event stream for cache invalidation, allowing for real-time updates to cached data. This will ensure that clients always receive the most up-to-date information without sacrificing performance, striking a balance between data freshness and query speed.
2.5. Improved service predictability & reliability¶
We will work across various upstream teams to have a consistent pagination across all queries, introduce per-query limit support and audit null responses which will improve data handling efficiency giving clients more predictable control over the data they query.
3. Operational Excellence and Infrastructure Resilience (14.5% resource allocation)¶
In 2025, we’re prioritizing operational excellence to maintain our GraphQL service as a robust and cutting-edge solution. This theme focuses on enhancing infrastructure, improving reliability, and staying ahead of technological advancements. These investments aim to support our growing client base and maintain our competitive edge. Importantly, these efforts will reduce our long-term bandwidth allocation for Oncall and KTLO support, allowing us to focus more resources on innovation and new feature development. Few of the key initiatives under this theme include:
3.1. Tier-1 Service Reliability¶
We’re committed to elevating FireFly Tier-1 status further, targeting 99.99% uptime from our current 99.45% availability. This involves improving our monitoring and alerting, standing up fallback mechanisms, and lowering our speed to recovery.
3.2. Agentic Oncall support¶
We will implement an AI-powered assistant to enhance operational efficiency. This tool will streamline incident response by automatically collecting relevant logs, analyzing issues, and providing actionable insights to Oncall engineers, reducing mean time to resolution and improving overall service reliability.
4. Non-Negotiables (11.5% resource allocation)¶
4.1. Region Flex support¶
We would migrate FireFly and MESK infrastructure from DUB to ZAZ as part of the Music wide Region Flex program. We are working closely with the central Region Flex team and exploring usage of Amazon’s IronHide tool, which supports custom workflows to transform code using deterministic algorithms and LLMs.
4.2. Security & Compliance¶
We would continue to address any security and compliance specific risks through initiatives like upgrading generalized platform-level metrics package for Skyfire to be DMA (EU Digitals Market Act) compliant, updating the pbkdf2 algorithm used for encrypting and decrypting profileId and adding RED data compliance for sensitive customer fields in FireFly APIs.
FAQs¶
1. What are the key Metrics & KPIs that we track for measuring FireFly’s success?¶
While our primary indicator of success for FireFly is the Customer Satisfaction Score (CSAT), measured quarterly on a scale of 1-7. This metric encapsulates the overall health and usefulness of our GraphQL service for our developers. The Q4’24 DSAT score stood at 4.06/7 with our target being a score of 5.5 or higher. This CSAT score reflects developer sentiment on API performance, documentation quality, ease of integration, and overall experience. Along with this we also measure below key supporting metrics: -
Performance Metrics
Request Rate: Queries and mutations per minute
Latency: Median and Trimmed Mean (TM95) response times
Error Rates: HTTP and GraphQL errors (4xx,5xx)
Operational Metrics
Cache Hit Rate: Effectiveness of caching strategy
Resolver Performance: Identifying bottlenecks
Subscription Notification Rate: Frequency of real-time updates
User Experience Metrics
Operation Complexity: Depth and breadth of queries
Field Usage: Most and least used fields
Client Versions: Adoption rates across different clients
Business Impact Metrics
API Uptime: Service availability
Time to Market: Speed of new feature deployments
Developer Productivity: Reduction in API development time
We continuously monitor these metrics using our observability tools, allowing us to proactively address issues, optimize our service, and demonstrate FireFly’s value to stakeholders.
2. What are some of the key quarter-wise product deliveries from FireFly in 2025?¶
Below are the key quarter-wise deliveries from FireFly: -
Quarter |
Core Capabilties |
Enabling stakeholder’s deliveries |
|---|---|---|
Q1’25 |
Maestro prompt history support |
Short Loopig Video visualizer support in FireFly |
Media central integration |
Collections entity support for project BlackBolt |
|
Identity support for Merlok devices (fire kids tablet) |
Key functionalities support for ‘FanGroups’ entity ( thumbnail, banner images, bock profile, UGC ban status etc.) |
|
[S&P] Improved Observability: e2e latency metrics breakdown |
‘Customer insights’ modeling to support Fandom CX |
|
[Operational Excellence] GRS (GothamRatingsService) deprecation support |
GRS deprecation’Evergreen |
|
Polls’Artist ↔ Merch relationship support |
||
Q2’25 |
[Non Negotiable] RegionFlex Support - FireFly Migration |
‘Follow’ functionality support for Concerts entity |
[Non Negotiable] RegionFlex Support - MESK Migration (Infra) |
‘keymasters’ support in FireFly |
|
Account creation (project Casper, Quattrroo) |
Related playlists & Stations resolvers |
|
Benefits management (Project Geet, GH) |
‘Nimbly’ deprecation support |
|
MetalFly - Phase 2 [S&P] |
Anonymous access to Playlist (OGRE), |
|
[S&P] Consistent error handling and gafana dashboard improvements |
top content queries |
|
[S&P] Reducing Query complexities for Phoneix / 3P APIs |
Stations Charts track ranking movement |
|
Q3’25 |
GraphQL Subscriptions |
Custom artwork for playlists RED data compliance for cancellation flow |
Audiobook playback capability |
||
FireFly platform latency optimizations including edge caching |
||
Internal Infra caching improvements |
||
[S&P] Adding metrics and corresponding alarms to each integration |
||
Infra Upgrades (node.js, javascripts, Apollo gql) |
||
Q4’25 |
Consistent pagination support |
|
‘Per query rate limit’ support for various Music entitites |
||
Red-Anvil recertification |
||
FireFly tier-1 service goal (99.99% availability) |
Throughout 2025, FireFly has focused on delivering core capabilities, performance improvements, and enabling key stakeholder initiatives. These deliveries would significantly strengthen FireFly’s functionality, performance, and integration capabilities, providing substantial value to both internal teams and external partners.
3. How will you ensure backward compatibility with these changes/ launches?¶
GraphQL ensures backward compatibility through additive changes only. New fields are made nullable with default values, while existing fields are never removed, only deprecated. This approach, combined with careful schema design and validation, allows the service to evolve without breaking older clients.
4. How does FireFly’s annual roadmap planning accommodate clients who follow quarterly planning cycles?¶
While we maintain an annual roadmap, we’ve designed our planning process to be both strategic and flexible. Our annual plan is based on long-term visibility of customer requirements (collected through the customer outreach mechanism held in jan’25) and organizational goals, allowing us to focus resources on core platform improvements that deliver significant value. However, we recognize the dynamic nature of our clients’ needs. We remain adaptable to priority shifts throughout the year, evaluating high-impact requests on a case-by-case basis and making informed trade-off decisions when necessary. This approach allows us to balance long-term strategic initiatives with the ability to respond to emerging client needs, ensuring FireFly continues to evolve in alignment with both overarching organizational goals and specific client priorities.
5. What is FireFly’s new away team engagement model?¶
FireFly employs an empowering away team engagement model, inspired by the established Amazon stores away team engagement model with slight variation as documented here, balancing client self-service with graph quality maintenance. We provide comprehensive documentation and tools for minor changes (example adding new enums to existing attributes in the graph, renaming attribute etc.) to the Graph, while for complex integrations, a client developer joins our core team temporarily. This away team member attends regular standup, works closely with our experts, and completes GraphQL training to upgrade their skills. They contribute to improving the overall graph while gaining deep insights into our framework. This approach ensures clients can efficiently meet their needs, maintains graph quality, and fosters a collaborative environment that benefits all users.
6. Why can’t all client requests be handled through the away team model without involvement from the core platform team?¶
While we strive to empower our clients, maintaining the quality, efficiency, and scalability of our GraphQL framework is paramount. All schema changes undergo a thorough review process by our core team to ensure architectural integrity and avoid inefficiencies that could impact all users. Due to limited bandwidth (each schema change on an average requires ~2 weeks of effort from the core team around schema design review, MR review etc.), we prioritize a select number of client requests each quarter. However, we remain open to discussing high-priority requests with strong business justification. These cases can be evaluated individually, involving leadership when necessary. This approach allows us to balance client needs with the overall health and evolution of our GraphQL ecosystem, ensuring it continues to benefit all customers effectively.
7. How is FireFly supporting Project GreenHornet?¶
FireFly is supporting the GreenHornet initiative in 2 major ways: We are addressing feature gaps that have been identified in the implementation, adding data to allow clients to exclusively rely on FireFly for all experiences. Additionally, we are working closely with the GreenHornet client team to improve performance and developer experience of the FireFly service. We are focusing improvements to latency, error messaging, and capability/benefit vending.
8. What are some of the key initiatives which are ‘unfunded’ and BTL (Below the line) currently?¶
Due to the resourcing constraints and available bandwidth, we are unable to fund below key capabilities as well as the away team support needed by our clients.
Unfunded FireFly core capabilities |
Unfunded Away team support |
|---|---|
Project Oracle (support P13N APIs in FireFly) |
|
11. What are the key discussion topics / leadership asks?¶
Below are few of the topics we need leadership guidance on: -
Funding Ask: Several critical initiatives (Refer FAQ 8 above), including core FireFly capabilities (Defer, Official Sdk, cache directives etc.) and stakeholder support through away team engagements (example Maestro use cases, 3P and Quattro support, Podcast support etc.), are currently ‘below the line’ on our roadmap due to capacity constraints. We’re requesting funding for two additional headcounts for the FireFly team to address these gaps. This investment would accelerate feature development, enhance stakeholder support, and improve our overall time-to-market. It would allow us to balance resource allocation, moving previously unfunded initiatives into active development while maintaining progress on current priorities.
Audible API access: We aim to integrate Audiobook playback support into FireFly, a strategic initiative that would significantly benefit various customer experience use cases across our 1P, 2P and 3P clients. However, we’ve encountered challenges in obtaining access to the necessary Audiobook playback, recommendation and borrow specific APIs, despite reaching out to the Audible team on several occasions. Given the strategic importance of this capability and its potential to deliver substantial value to our stakeholders, we’re seeking leadership intervention to unblock this initiative.
Otel (Open telemetry) adoption: Our Q4’24 developer experience survey and project ‘Heron’ have highlighted stability and performance as critical pain points. While FireFly is investing in improving end-to-end latency observability for server-side operations, we believe significant benefits can be unlocked by extending OpenTelemetry (OTel) implementation to the client side. This comprehensive approach would provide holistic performance insights, enable proactive issue resolution, and ultimately upgrade customer experience. However, client-side implementation remains unclear and unfunded. We urge leadership to prioritize and fund full-stack OTel adoption (or other available options like Bugsnag performance sdk), including client-side instrumentation, to achieve end-to-end observability. This investment will not only address current performance challenges but also future-proof our architecture, facilitating data-driven decision-making and more efficient problem resolution across our entire application stack. We are working closely with the regionflex central team to understand the scope of end-to-end telemetry support they are proposing for AM.
Appendices¶
Appendix A: 2025 Roadmap (Link)¶
Theme |
Initiative |
Impact/ Why? |
Engagament Model |
Landing Quarter |
|---|---|---|---|---|
Non-Negotiables |
Non-Negotiable RegionFlex Cost Savings Initiative |
FireFly Owned |
II |
|
Non-Negotiables |
Non-Negotiable RegionFlex Cost Savings Initiative |
FireFly Owned |
II |
|
Non-Negotiables |
mitigate security risk |
FireFly Owned |
IV |
|
Non-Negotiables |
compliance |
Away Team |
III |
|
FireFly Core Capability |
Enables the Fandom S-Team Goal: https://kingpin.amazon.com/#/items/932770 |
Away Team |
II |
|
FireFly Core Capability |
Enables Agentive Audio Companion (Blackbolt) Top Goal https://kingpin.amazon.com/#/items/932777 |
Away Team |
I |
|
FireFly Core Capability |
Supporting ‘fangroups’ as a new entity in FF and multiple functionalities around it |
Enables the Fandom S-Team Goal: https://kingpin.amazon.com/#/items/932770 |
Away Team |
I |
FireFly Core Capability |
Enables the Fandom S-Team Goal: https://kingpin.amazon.com/#/items/932770 |
Away Team |
I |
|
FireFly Core Capability |
GreenHornet + Skyfire deprecation |
FireFly Owned |
II |
|
FireFly Core Capability |
Improved user experience through real time updates alongwith platform improvements around efficient communication, simplified data flows and reduced network traffic |
FireFly Owned |
III |
|
FireFly Core Capability |
Account creation support (Quattrro, Casper, Tesla, Discord, 3P etc.) |
Enables multiple use case that need account creation in the user flow such as Quattrro, GH, Tesla, Discord etc. |
FireFly Owned |
II |
FireFly Core Capability |
Subscription & Benefits management (Project Geet, FMPM Dynamic Entitlement) |
FireFly Owned |
II |
|
FireFly Core Capability |
S-Team Goal Montana support; 3P + Greenhornet |
FireFly Owned |
III |
|
FireFly Core Capability |
Enables encryption and decryption functionality for multiple use cases including GH, AMConnect etc. |
Away Team |
II |
|
FireFly Core Capability |
BITS support |
Away Team |
I |
|
FireFly Core Capability |
2024 Quality Upgrades goal completion https://jira.music.amazon.dev/browse/CP-705 |
Away Team |
I |
|
FireFly Core Capability |
Migrate Firefly to call BMC for podcast data - Bigscreen, Auto and Carplay |
Away Team |
II |
|
FireFly Core Capability |
Away Team |
III |
||
FireFly Core Capability |
Away Team |
II |
||
FireFly Core Capability |
FireFly Project Intake - Anonymous access to Playlist (OGRE), top content queries and Stations |
Away Team |
II |
|
FireFly Core Capability |
FireFly Schema Design Intake - [Charts track ranking movement] |
Away Team |
II |
|
FireFly Core Capability |
FireFly Project Intake - Evergreen Polls (Client app version and OS type) |
Away Team |
I |
|
FireFly S&P Improvements |
Latency improvement for all clients |
FireFly Owned |
II |
|
FireFly S&P Improvements |
Benefits all clients by improving the latency through progressive fetching of data |
FireFly Owned |
IV |
|
FireFly S&P Improvements |
Performance improvement at FF end while resolving various entities like tracks, artists, albums etc. |
FireFly Owned |
IV |
|
FireFly S&P Improvements |
FireFly Platform Latency Optimizations including edge caching |
Latency improvements through optimized queries, Metadata Hydration latencies improvement, onboarding BMC (Beyond Music Catalog) |
FireFly Owned |
III |
FireFly S&P Improvements |
Latency improvements through robust caching mechanisms |
FireFly Owned |
III |
|
FireFly S&P Improvements |
Instrumentation: Include complete service side latency apart from query duration |
Latency measurement to provide attribution insights |
FireFly Owned |
I |
FireFly S&P Improvements |
Better visibility and deep-dive capabilities around key platform metrics |
FireFly Owned |
II |
|
FireFly S&P Improvements |
Increased Observability: Adding metrics and corresponding alarms to each integration |
Provides visibility and root cause ability on platform health |
FireFly Owned |
III |
FireFly S&P Improvements |
FireFly Owned |
III |
||
FireFly Operational Excellence |
Enables better CX based on the specific error returned by downstream services |
FireFly Owned |
II |
|
FireFly Operational Excellence |
Tier-1 Service |
FireFly Owned |
IV |
|
FireFly Operational Excellence |
Security / Data Privacy |
FireFly Owned |
IV |
|
FireFly Operational Excellence |
FireFly Owned |
II |
||
FireFly Operational Excellence |
DevEx Infra Cost savings goal (reduce loggings, configurations etc.) |
FireFly Owned |
III |
|
FireFly Operational Excellence |
Non-Negotiable RegionFlex Cost Savings Initiative |
Away Team |
II |
|
FireFly Operational Excellence |
Support GRS (GothamRatingsService) deprecation by migrating to ES3 |
Compliance |
FireFly Owned |
I |
FireFly Operational Excellence |
FireFly Owned |
III |
||
FireFly Operational Excellence |
Infra Upgradations |
Security |
FireFly Owned |
III |
Appendix B: FireFly - Pain points and Areas of Improvement (n=50)¶
Pain Points |
Areas of Improvement |
|---|---|
Away Team Contributors - Lack of clear documentation around FireFly concepts, Tenets, Guidelines, Best Practices and Engagement model (52%) |
Consumers - Enhanced observability and easy E2E trace profiling (60%) |
Away Team Contributors - Code reviews take a long time (52%) |
Consumers - Unified documentation, Dev Console and Schema exploration feature (50%) |
Away Team Contributors - Unable to launch short-term, experimental CX without going through the entire process of reviews, schema design, and implementation (50%) |
Away Team Contributors - Low barrier schema experimentation (44%) |
Away Team Contributors - Merging my approved branch/ MR takes a long time (38%) |
Consumers - Support for API specific Debuggers (34%) |
Consumers - Telemetry and service level visibility (34%) |
Consumers - Consistent error reporting and handling (22%) |
Consumers - FireFly Platform stability (26%) |
Consumers - GraphQL Realtime Subscriptions (20%) |
Data Providers - Lack of controls to hold specific CXs accountable for service side issues (20%) |
Consumers - Consistent Pagination support (18%) |
Away Team Contributors - Flaky Integration tests make the pipeline unreliable and unstable (16%) |
Consumers - FireFly SDK (16%) |
Data Providers - Lack of controls to prevent service level abuse through fanout or out of control dial-ups (12%) |
Consumers - Defer & Stream functionality (14%) |
Data Providers - Individual service level rate limit protection (12%) |
|
Away Team Contributors - FireFly Image schema enhancement to include additional attributes (10%) |
Appendix C: Roadmap Insights¶

Appendix D: Important Artifacts¶
Firefly Prioritization Framework and Engagement Model
DevEx Q4’24 Survey Results & Analysis
FireFly OP1 2025
Firefly roadmap 2025
Notes on Gen AI / Dev Productivity
Music Benefits Platform Evolution
Firefly Queries end of 2024
FireFly capabilities developed in 2024
Amazon Music’s Use of OpenTelemetry for DMA and Beyond
