Overview

Media elements are the main source of layout shift, data waste, and accessibility failures on production sites. Correct use of <picture>, srcset, sizes, loading, and <track> eliminates most of these issues before JavaScript runs. This page covers the rules for images, video, and audio.

Use <picture> for format switching; srcset for resolution switching

<picture> wraps <source> elements that each declare a format or media condition, plus a fallback <img>. The browser picks the first source whose type and media it supports.

<picture>
  <source srcset="/hero.avif" type="image/avif" />
  <source srcset="/hero.webp" type="image/webp" />
  <img src="/hero.jpg" alt="A Tokyo subway map." width="800" height="450" />
</picture>

Use <picture> when you want to serve AVIF to browsers that support it and JPEG to those that do not. Use a single <img> with srcset when you only need different resolutions of the same format.

srcset and sizes tell the browser which file to download

srcset lists candidate files with their intrinsic widths. sizes tells the browser how wide the image will render at each viewport width, so it can pick the best candidate before the stylesheet loads.

<img
  src="/hero-800.jpg"
  srcset="/hero-400.jpg 400w, /hero-800.jpg 800w, /hero-1200.jpg 1200w"
  sizes="(max-width: 720px) 100vw, 800px"
  width="800"
  height="450"
  alt="A Tokyo subway map."
/>

Without sizes, the browser assumes the image fills 100 vw and downloads the largest candidate. On a 375 px mobile screen, that is a 1200 px image fetched unnecessarily.

Always set width and height to prevent layout shift

Set width and height on every <img> and <video>. The browser uses them to compute the aspect ratio before the bytes arrive, reserving the correct space. Omitting them causes cumulative layout shift (CLS) when the media loads. See image-seo for CLS thresholds.

The numeric values should match the image’s intrinsic width and height in pixels. CSS can resize the element; the reserved space scales with it.

Use loading=“lazy” on below-the-fold images; eager on LCP candidates

<img src="/hero.jpg" loading="eager" fetchpriority="high" alt="..." />
<img src="/card.jpg" loading="lazy" alt="..." />

loading="lazy" defers the fetch until the image is near the viewport. Use loading="eager" (the default) plus fetchpriority="high" on the Largest Contentful Paint (LCP) image. Do not lazy-load anything visible on first paint; it delays LCP and tanks Core Web Vitals. See html-links-rel for <link rel="preload"> as an alternative early-fetch signal.

Video needs a poster, captions, and no autoplay with sound

<video
  src="/demo.mp4"
  poster="/demo-poster.jpg"
  width="800"
  height="450"
  controls
  preload="none"
>
  <track kind="captions" src="/demo.vtt" srclang="en" label="English" />
  Your browser does not support the video element.
</video>
  • poster: shown before the video loads. Without it, the browser shows a black frame or the first decoded frame, both of which cause layout shift.
  • controls: required for keyboard access. Never replace native controls with JavaScript-only UI.
  • preload="none": avoids fetching the video on page load when the user may never watch it.
  • <track kind="captions">: provides captions for users who are deaf or hard of hearing. WCAG 1.2.2 requires captions for prerecorded audio. See accessibility for the conformance level.
  • Do not set autoplay with audio; it violates WCAG 1.4.2 and annoys users. autoplay muted on a background video is acceptable.

Audio follows the same rules as video

<audio controls preload="none">
  <source src="/podcast.mp3" type="audio/mpeg" />
  <source src="/podcast.ogg" type="audio/ogg" />
  <track kind="captions" src="/podcast.vtt" srclang="en" label="English" />
</audio>

Provide a transcript linked near the player as well; captions on an audio-only element are not always surfaced by players. WCAG 1.2.1 requires a transcript for prerecorded audio-only content.

Common pitfalls

  • Missing alt on <img> inside <picture>. The alt belongs on the <img>, not on <source>.
  • srcset without sizes on images that are not full viewport width.
  • <video autoplay> without muted, which browsers block and users hate.
  • No poster on video, causing black-frame flash.
  • Omitting <track> on any <video> with meaningful audio.
  • Using <picture> for resolution switching when a single <img srcset> would do. Reserve <picture> for format or art-direction changes.

For Next.js image optimization that generates srcset and sizes automatically, see nextjs-image-optimization.