Talks

How to be a Performance Badass
The talk will be centered on Do's and Don'ts but focused on individuals and how to be successful in that world.
Performance at a Macro Level
Better performance of Mobile apps keeps more users engaged and results in achieving business goals. Often when we talk about performance we look at the problem with a magnifying glass to find every little thing that might have been contributing to the startup latency of an app. In this talk, we will cover how we approached the idea of making UberEats app more performant, created a phased approach and share the findings. We will include the entire stack - mobile technologies and backend services - that helped achieve our goals of increasing the performance.
Data Engineering at the Speed of Your Disk
Our current best disk can read data at speeds of gigabytes per second; the best networks are even faster. We should aim for data engineering tasks (data filtering, parsing, validation) to achieve similar high speeds. Bottleneck tasks such as JSON ingestion can be much faster than they currently are.
Slides
Performance Testing for Firebase Cloud Messaging Backend
Firebase Cloud Messaging (FCM), formerly known as Google Cloud Messaging, is a cross-platform messaging solution to send notification to client apps. Performance testing for the messaging backend is a challenging problem in different aspects like networking, authentication, etc. In this talk I will cover the challenges of and best practice applied to the FCM performance testing infrastructure and how FCM developers use it for different testing purposes.
Understanding Kernel Scheduling Behavior with SchedViz
Kernel scheduling can be a significant source of latency problems: when a thread isn't running, it can't service requests or do anything else. SchedViz is a newly-open-sourced tool that provides fine-grained visibility into kernel scheduling behavior, and, increasingly, into other kernel phenomena as well. This talk will provide a brief walk-through of SchedViz, including how it works and what we used it for.
Solving Reliability Challenges with Blackbox
Blackbox is a mobile instrumentation framework designed to capture context leading up to an error site. In this talk, we discuss how Facebook is using Blackbox to tackle functional bugs and crashes in our apps.
Visual Completion measurement on Web
Visual Completion is a new solution of user-centric metrics measurement for RUM (Real-User-Monitoring) logging. It can track user perceived visual performance of full page loading, in-app navigations and user interactions. In contrast to most traditional latency measurements like capturing just start and end time of any task duration, Visual Completion considers display timing of elements at pixel count level in progressive web-app rendering architecture. In addition to tracking visual performance, we are also measuring TTI(Time-To-Interactive) to collect performance signals for app responsiveness.
Slides
Applying Statistics to Root-Cause Analysis
As systems get more complex, reasoning about performance gets more difficult. Telemetry data emitted by our services is noisy and usually unhelpful in stressful situations. Distributed Tracing, in particular, can provide rich, contextual data but root-cause analysis can still be convoluted. In this talk, I'll review a few statistics-based approaches we have applied to help quickly identify which properties of the system are correlated with performance issues. In order to support this type of aggregate trace analysis, we need data, but data isn't cheap. We want to gather only the relevant traces and bias towards traces that have abnormal behavior. I'll also talk about a few sampling approaches we use for analysis to minimize cost and overhead.
Slides
Uplevelling Understanding with Transient Analysis
Performance on mobile devices is often heavily dependent on the efficient use of shared resources like network bandwidth and RAM and orchestration between disparate components that rely on them. Understanding the (often surprising) conditions that arise “in the wild,” their prevalence along your user population, and therefore how client code should optimally adapt to perform best under various transient conditions is very challenging. Transient Analysis is a methodology and toolset we’ve built to enable this type of understanding by modeling expected domain-specific behaviors, processing telemetry to characterize adherence to or divergence from these expectations at scale, and linking this analysis to actionable insights and visualizations of actual examples of problematic behavior. This session will walk through the development lifecycle of such an analysis and demo the tooling that enables it.
The Sociotechnical Path to High-Performing Teams (Begins With Observability)
"Observability" is everywhere these days, but what does it actually mean? Is it just a new marketing term for the same old monitoring we've always done? Are there three pillars, or no pillars? It's enough to make anyone cranky and cynical about the motives of those involved. I'll give a brief history of observability and control systems theory, make a pitch for the precise technical definition of observability, and explain how it differs from monitoring and other telemetry -- and why it has recently suddenly become so shudderingly relevant to us all. I will discuss the second-order technical implications and effects of the definition I espouse, and describe the characteristics of tools we must build to understand the systems of tomorrow. We are far behind where we should be as a profession when it comes to how much of our effort is wasted on crap that doesn't move the business forward, and this is in large part because our ability to understand our systems is so wretched -- and we don't even know it. Let's fix that.
Using BPF for lightweight Android profiling
BPF gives you the power to understand application performance in ways that were not possible before, it is the newest tool Mobile Profilers team is using to understand application performance and detect regressions in Consumption Metrics on Android devices, in this talk we will discuss the powers of BPF and how we are using for lightweight and dynamic profiling.
Slides
FlameCommander: Netflix’s cloud profiler
Even under constant load, the behavior of a system is affected by variance, perturbations, single-threaded execution and other time-based issues, and never completely uniform, making the analysis of these small variations a needle-in-a-haystack problem. FlameScope solved this problem by combining a subsecond-offset heatmap, for navigating a profile and visualizing these perturbations, with a flame graph for code-path analysis. This talk focuses on how FlameScope, the open-source profile visualization tool, evolved into FlameCommander, a full-fledged cloud profiling solution used by thousands of engineers at Netflix.
Slides
Faster Data-Center Apps with BOLT
Code-layout optimizations are paramount for optimal performance of large data-center applications. In this talk, I will cover multiple approaches to improve the code layout of an application, introduce an open-source binary optimization tool BOLT, and walk through the challenges of deploying it at a Facebook scale. Lastly, I will share the plans for seamless integration of the binary optimization technology into the server application space.