Measuring the Subjective:
The Performance Dashboard

http://estelle.github.io/SpeedPerception

Estelle Weyl

Open Web Evangelist / Front End Engineer

Twitter: @webdevtips, @estellevw, @standardista
Blog: www.standardista.com

  • HTML5 and CSS3 for the Real World
  • Animations and Transitions with CSS
  • MObile HTML5
  • Web Performance Daybook
  • CSS: The Definitive Guide
  • Flexible boxes in CSS

Performance Overview

  1. What is Web Performance
  2. Objective Metrics
  3. Perceived Performance

What is Fast?

What is fast? Hello World Lord of the Rings 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 s 10 s 11 s 12 s 13 s 14 s 15 s 16 s 17 s 18 s 19 s 20 s Time to Load Number of Visitors

What is Load Time?

a made up histogram in a bell curve shape with a long tail

Which feels faster?

Performance

  • Time to Load
  • Time until usable
  • Jitter
  • Responsiveness
  • Smoothness

RAIL

Four phases of interaction: end-user’s perception

  1. Response to Input touch_app
  2. Animation & Scrolling directions_run
  3. Idle alarm
  4. Page Load cached

Video: How Users Perceive the Speed of The Web (2015): Paul Irish / Google

Web Performance

Download
# of resources: images, fonts, HTML, scripts, and CSS loaded
Parse
File size of above resources
Execute
Parsing & Painting
Perceived Performance
Users perception of the speed of the load and reaction time.

Objective Metrics: 

Navigation Timing API

Navigation Timing API metrics

Objective v. Subjective

Load Time v. Visually Complete

Load Times: 3,729ms v. 3,768ms

Visually Complete: 16s v. 8.7s

Staples
Wolferman's

What is good User Experience?

Active v. Passive Waiting

  1. 3.0s
  2. 4.0s
  3. 5.0s
  4. 6.0s
  5. 7.0s
  6. 8.0s

  • 0%

  • 56%

  • 56%

  • 67%

  • 67%

  • 100%

  • 0%

  • 56%

  • 66%

  • 66%

  • 100%

Questions

  • How does visual jitter impact perceived performance?
  • Are sites free of visual jitter like modals and overlays viewed as more performant?
  • Is it possible to automatically predict the presence of jitter to help choose a better set of metrics?
  • Does a long DOM Content Loaded impact perceived performance?
  • Can metrics be improved?

User Experience > Developer Experience

graph comparison of API timing navigtion metrics for Wolferman's versus Staples

Visual Metrics

  1. SpeedIndex (SI)
  2. Perceptual Speed Index (PSI)

Speed Index

Metric on above-fold visual Quality of Experience

  • Created by Patrick Meenan (Google)
  • Used on WebPage Test

Speed Index

Aggregate function on quickness of above-the-fold visual completion:

  • speed index graph for both sites
  • speed index graph for staples4,462
  • speed index graph for wolfermans5,902

equation for speed index

Measurement of visual progress in Speed Index

  • Frame-by-frame VC progress is computed from pixel-histogram comparisons

    equation for speed index

Huh? Speed Index Explained

a demonstration to help explain how speed index is calculated

Measurement of visual progress in Speed Index

  • Frame-by-frame VC progress is computed from pixel-histogram comparisons

    equation for speed index

  • Pixel-wise similarity (mean histogram difference a.k.a. MHD) doesn’t capture visual perception!
    • Perception of Shape / Color / Object similarity

Pixel-wise similarity doesn’t capture shape similarity

Black/White = 50/50             MHD (Mean Histogram Difference) = 0

  • box that is 50% white and 50% black
  • box that is 50% white and 50% black
  • box that is 50% white and 50% black
  • box that is 50% white and 50% black
  • box that is 50% white and 50% black
  • box that is 50% white and 50% black

Pixel-wise similarity doesn’t capture color similarity

Proposal for a perceptually oriented visual QoE metric

  • Update: Frame-by-frame VC progress computation using SSIM

Perceptual Speed Index

Frame-by-frame VC progress computation using SSIM

equation for perceptual speed index

Without Jitter

With Jitter

PSI v. SI

  • SI and PSI: linearly correlated
  • Visual jitter / layout thrashing? PSI > SI
    • PSI appears higher when visual jitter exists (Pop-up ads / large lay-out changes / etc.)
  • SSIM based visual progress measurements match human perception more closely than MHD
  • SSIM / MHD swap doesn’t affect websites without visual jitter

Staples

visual progression of Staples page load showind SI and PSI

Wolfermans

visual progression of Wolfermans page load showind SI and PSI

PSI v. SI

Speed Index

  • Primarily focused on progress of above-fold loading
  • Does not account for layout stability

Perceptual Speed Index

  • A perceptually oriented metric to measure above-fold visual QoE
  • Designed to account for visual jitter (layout stability)
  • Complementary to SI

Resources

SpeedPerception

  1. SpeedPerception Study Overview
  2. Hypothesis
  3. Results
  4. Learnings
  5. Phase 2

SpeedPerception

“SpeedPerception is a large-scale web performance crowdsourcing study focused on the perceived loading performance of above-the-fold content.”

Premise: Perception of perceived performance is relative.

SpeedPerception Challenge

Study Hypotheses

Hypothesis 1: Visual metrics will perform better than non-visual/network metrics


Hypothesis 2: No single metric can explain human choices with 90%+ accuracy


Hypothesis 3: User will not wait until “Visual Complete” to make their choice (despite the explicit instruction to wait until video turns grey)

Study Metrics

  • 5,444 sessions, of which 51% were complete and valid
  • 77,482 votes, of which 75% were valid
  • graph demonstrating each of the 160 pairs webkit-font-feature-settings tested between 230 and 330 times

Feedback

Perception of speed and UX strongly impacted by popups / overlays

histogram of comments made, highlighting that pop ups was commonly mentioned

Hypothesis 3

Hypothesis 3: User will not wait until “Visual Complete” to make their choice (despite the explicit instruction to wait until video turns grey)

Hypothesis 1

Hypothesis 1: Visual metrics will perform better than non-visual/network metrics

Not True

Questions to Consider

  • Does presence of visual jitter / interstitials interfere with metric performance?
    • Can metrics be improved?
  • Will there be different trends for sites that are free of visual jitter like modals and overlays?
  • Is it possible to automatically predict the presence of jitter to help choose a better set of metrics?
graph showing most people have enough information to choose which page loads faster somewhere between PSI, load time and visually complete equation for Speed Index

Hypothesis 1: Visual metrics will perform better than non-visual/network metrics

True

Hypothesis 2

Hypothesis 2: No single metric can explain human choices with 90%+ accuracy

True

Hypothesis 2: No single metric can explain human choices with 90%+ accuracy

Still True

Conclusions & Thoughts

  • There appears to be no one unicorn metric but, is there a combination synthetic metric (joint ML model) that will do a better job?

Conclusions & Thoughts

Conclusions & Thoughts

Conclusions & Thoughts

  • There appears to be no one unicorn metric but, is there a combination synthetic metric (joint ML model) that will do a better job?
  • People only looked two videos and made the call. Is there some additional information that we can extract from videos that will improve our models?

Credits & Links: SpeedPerception

Phase-1 crowd sourced 07/28/2016 - 09/30/2016.

Speed Perception Phase 2

  • How do visual jitter & interstitials impact perceived performance?
    • Do they interfere with metric performance?
    • Can metrics be improved?
    • Are sites free of visual jitter like modals and overlays viewed as more performant?
    • Is it possible to automatically predict the presence of jitter to help choose a better set of metrics?
  • Does a long DOM Content Loaded impact perceived performance?

User Experience > Developer Experience

same image comparing navigtion timing API metrics between Staples and Wolfermans

Thank you

Measuring the Subjective:
The Performance Dashboard

http://estelle.github.io/SpeedPerception

Estelle Weyl
@estellevw