ElectronicZoologyfield notes from the garage
Audio • ESP32

How to play audio from
ESP32 with MAX98357A

Amp: MAX98357A (3.2W mono class-D, I2S)
Board: ESP32 Dev Board (38-pin) or ESP32-C3
Speaker: 4-8Ω speaker
Interface: I2S (digital)
MAX98357A I2S Series - Step 2 of 4 - Playing a clip
✓ Confirmed Working
These guides are ordered by complexity but each one stands alone - if you believe and have courage, stand tall and boldly continue. If this is your first time, start at Step 1 - How to wire the MAX98357A I2S amp with ESP32 →

What You Need

Parts

Software

Install on Arch Linux:

Copy and paste into terminal

sudo pacman -S espeak-ng ffmpeg python

How It Works

The MAX98357A is a digital I2S amplifier. It receives audio as a digital signal directly from the ESP32's I2S peripheral, converts it to analogue internally, and drives a speaker - all in one chip. No external DAC, no coupling capacitors, no decoupling capacitors. Just three signal wires and power.

I2S (Inter-IC Sound) is a three-wire digital audio bus used between chips. It carries a bit clock (BCLK), a word select line that pulses once per sample to indicate left or right channel (LRCLK), and a serial data line (DIN). The ESP32 generates all three in hardware at whatever sample rate and bit depth you configure. The MAX98357A sits at the other end, clocks the data in, and amplifies it.

The audio data lives in the sketch as a C array of 16-bit signed integers stored in flash memory. The sketch feeds that array to the I2S peripheral over DMA - the CPU hands off the data and the hardware takes care of the rest. No SD card, no filesystem, no streaming complexity for short clips.

Why I2S and not the DAC? The ESP32 has DAC pins, but I2S is the better choice for audio: the signal stays digital all the way to the amp, which means no DC offset, no coupling cap, no noise pickup on the signal wire, and 16-bit resolution instead of 8-bit. The MAX98357A handles the digital-to-analogue conversion internally.

The SD pin on the MAX98357A controls shutdown. Left floating, the module is enabled and outputs a mono mix of the left and right I2S channels - useful since we are sending mono audio on the left channel. The GAIN pin left floating sets 9dB of gain, which is enough for most speakers at a reasonable listening volume.

Wiring

MAX98357A PinESP32 Dev BoardESP32-C3Notes
VIN5V5V5V recommended - 3.3V works at lower volume
GNDGNDGND
LRCGPIO14GPIO5I2S word select (LRCLK)
BCLKGPIO27GPIO4I2S bit clock
DINGPIO13GPIO6I2S data out
SDNot connectedNot connectedLeave floating - amp enabled, mono mix of L+R
GAINNot connectedNot connectedLeave floating - 9dB gain

Connect your speaker to the two output terminals on the MAX98357A module. Polarity only affects phase in a mono setup - either orientation will produce sound. Use a speaker rated 1W or above - the MAX98357A delivers up to 3.2W into 4Ω at 5V, and a 0.5W speaker will distort badly at full gain.

Why not GPIO25 and GPIO26? Those are the ESP32's built-in DAC pins. They work for I2S but reserving them for DAC keeps your options open - if you ever want to add an analogue audio output to a different project on the same board, the DAC pins are still free. GPIO27, GPIO14, and GPIO13 are clean general-purpose GPIOs with no boot-strapping or special functions.
No capacitors needed. The signal path is fully digital right up to the amp chip. There is no DC offset to block and the module handles its own power supply filtering. Three signal wires and power is all you need.
Using an ESP32-C3? The pin numbers and board selection are different. See ESP32-C3 differences →

Preparing an Audio Clip

Option 1 - Generate a voice clip with espeak-ng

Copy and paste into terminal

espeak-ng -w /tmp/myclip.wav "your text here"

Option 2 - Extract from an existing audio or video file

Use ffmpeg to cut a section from any WAV, MP3, or video file:

Copy and paste into terminal

ffmpeg -i /path/to/source.wav -ss 00:00:05 -to 00:00:08 /tmp/myclip.wav

What the flags mean:

  • -i /path/to/source.wav - your input file. /path/to/source.wav is a placeholder - replace it with the actual location of your file on disk (e.g. ~/Downloads/mysound.mp3 or /home/j/Videos/clip.mp4). Not sure of the path? Drag the file into a terminal window and the full path will be pasted in automatically.
  • -ss 00:00:05 - start cutting at 5 seconds
  • -to 00:00:08 - stop cutting at 8 seconds (3-second clip)
  • /tmp/myclip.wav - output path. /tmp is a temporary folder Linux provides - files survive until reboot
Finding your timestamps: Open the file in your video or audio editor of choice, note where your clip starts and ends, and use those values for -ss and -to.

Examples:

If the file is in your home folder:

Copy and paste into terminal

ffmpeg -i ~/mysound.wav -ss 00:00:05 -to 00:00:08 /tmp/myclip.wav

If the file is in Downloads:

Copy and paste into terminal

ffmpeg -i ~/Downloads/mysound.mp3 -ss 00:00:05 -to 00:00:08 /tmp/myclip.wav

If the file is a video - ffmpeg extracts the audio automatically:

Copy and paste into terminal

ffmpeg -i ~/Videos/myvideo.mp4 -ss 00:00:05 -to 00:00:08 /tmp/myclip.wav

If the filename has spaces - wrap it in quotes:

Copy and paste into terminal

ffmpeg -i "/home/j/My Audio File.wav" -ss 00:00:05 -to 00:00:08 /tmp/myclip.wav

Convert to I2S format

The MAX98357A expects 16-bit signed PCM. Convert your clip to the right format:

Copy and paste into terminal

ffmpeg -y -i /tmp/myclip.wav -ar 22050 -ac 1 -acodec pcm_s16le /tmp/clip_i2s.wav

What the flags mean:

  • -y - overwrite the output file without prompting if it already exists
  • -ar 22050 - sample rate in Hz - this value must match SAMPLE_RATE in the sketch
  • -ac 1 - mono
  • -acodec pcm_s16le - 16-bit signed PCM, little-endian - the format I2S and the MAX98357A expect
What sample rate should I use? Run your own tests - what sounds acceptable depends on your speaker and use case. 22050 is a reasonable starting point. 8000 works well on many speakers and produces much smaller files. Whatever value you use here must match SAMPLE_RATE in the sketch, or the clip will play at the wrong speed.

WAV to C Header

Save the script below as wav_to_i2s_header.py, then execute it to convert your audio into a header file for the sketch. How to save a script →

import sys, wave, struct

input_file  = sys.argv[1]  # e.g. /tmp/clip_i2s.wav
output_file = sys.argv[2]  # e.g. clip.h

# wave module reads and strips the WAV header automatically -
# readframes() returns raw PCM samples only
with wave.open(input_file, 'rb') as f:
    raw = f.readframes(f.getnframes())

# unpack as signed 16-bit little-endian integers (pcm_s16le)
samples = struct.unpack('<' + str(len(raw)//2) + 'h', raw)

lines = [
    '#pragma once',
    'const int16_t audio_data[] = {',
]
chunks = [str(s) for s in samples]
for i in range(0, len(chunks), 16):
    lines.append('  ' + ', '.join(chunks[i:i+16]) + ',')
lines.append('};')
lines.append('const size_t audio_len = sizeof(audio_data);')

with open(output_file, 'w') as f:
    f.write('\n'.join(lines) + '\n')

print('Done: ' + str(len(samples)) + ' samples -> ' + output_file)

Copy and paste into terminal

python3 wav_to_i2s_header.py /tmp/clip_i2s.wav clip.h

Place clip.h in your sketch folder - the same directory as your .ino file.

Sketch

Step 1 - Tone test (confirm wiring first)

Flash this before dealing with audio files. It generates a 440Hz tone via I2S - no clip.h needed. If you hear a tone, wiring and I2S are confirmed working.

/*
 * We stand on the shoulders of giants when we build
 * with knowledge gained from others' efforts.
 * That doesn't make us giants. Be humble.
 * Create with care. Open source is the way.
 *
 * MAX98357A I2S - Tone Test
 * --------------------------
 * Generates a 440Hz sine wave via I2S to confirm
 * wiring and amp are working. No audio file needed.
 *
 * Board:   ESP32 Dev Board (38-pin) or ESP32-C3
 * Amp:     MAX98357A
 *
 * Wiring (ESP32 Dev Board):
 *   VIN  -> 5V      GND  -> GND
 *   BCLK -> GPIO27  LRC  -> GPIO14
 *   DIN  -> GPIO13  SD   -> not connected
 *
 * Wiring (ESP32-C3):
 *   VIN  -> 5V      GND  -> GND
 *   BCLK -> GPIO4   LRC  -> GPIO5
 *   DIN  -> GPIO6   SD   -> not connected
 *
 * Open source - MIT Licence
 * Electronic Zoology - field notes from the garage
 * https://electroniczoology.com/guides/how-to-play-audio-esp32-max98357a
 */

#include <driver/i2s.h>
#include <math.h>

// ESP32 Dev Board (38-pin)
#define I2S_BCLK   27
#define I2S_LRCLK  14
#define I2S_DOUT   13
// ESP32-C3 - comment out the three lines above and uncomment these:
//#define I2S_BCLK   4
//#define I2S_LRCLK  5
//#define I2S_DOUT   6

#define SAMPLE_RATE 44100
#define FREQUENCY   440
#define AMPLITUDE   20000
#define BUF_SIZE    256

int16_t buf[BUF_SIZE];

void setup() {
  i2s_config_t cfg = {
    .mode                 = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_TX),
    .sample_rate          = SAMPLE_RATE,
    .bits_per_sample      = I2S_BITS_PER_SAMPLE_16BIT,
    .channel_format       = I2S_CHANNEL_FMT_RIGHT_LEFT,
    .communication_format = I2S_COMM_FORMAT_STAND_I2S,
    .intr_alloc_flags     = ESP_INTR_FLAG_LEVEL1,
    .dma_buf_count        = 8,
    .dma_buf_len          = 64,
    .use_apll             = false,
    .tx_desc_auto_clear   = true,
  };

  i2s_pin_config_t pins = {
    .bck_io_num   = I2S_BCLK,
    .ws_io_num    = I2S_LRCLK,
    .data_out_num = I2S_DOUT,
    .data_in_num  = I2S_PIN_NO_CHANGE,
  };

  i2s_driver_install(I2S_NUM_0, &cfg, 0, NULL);
  i2s_set_pin(I2S_NUM_0, &pins);
}

void loop() {
  static uint32_t phase = 0;
  size_t written;

  for (int i = 0; i < BUF_SIZE; i += 2) {
    int16_t sample = (int16_t)(AMPLITUDE * sinf(2.0f * M_PI * FREQUENCY * phase / SAMPLE_RATE));
    buf[i]     = sample;
    buf[i + 1] = sample;
    phase++;
    if (phase >= SAMPLE_RATE) phase = 0;
  }

  i2s_write(I2S_NUM_0, buf, sizeof(buf), &written, portMAX_DELAY);
}

Step 2 - Play an audio clip

Once the tone test confirms everything is working, swap to this sketch to play your own audio. Place clip.h in the same folder as the .ino file before compiling. The clip plays on startup and loops with a one-second pause between plays.

/*
 * We stand on the shoulders of giants when we build
 * with knowledge gained from others' efforts.
 * That doesn't make us giants. Be humble.
 * Create with care. Open source is the way.
 *
 * MAX98357A I2S Amplifier - ESP32 Dev Board
 * ------------------------------------------
 * Plays a 16-bit PCM audio clip stored in flash
 * via I2S to a MAX98357A amplifier and speaker.
 *
 * Board:   ESP32 Dev Board (38-pin) or ESP32-C3
 * Amp:     MAX98357A
 *
 * Wiring (ESP32 Dev Board):
 *   VIN  -> 5V      GND  -> GND
 *   BCLK -> GPIO27  LRC  -> GPIO14
 *   DIN  -> GPIO13  SD   -> not connected
 *
 * Wiring (ESP32-C3):
 *   VIN  -> 5V      GND  -> GND
 *   BCLK -> GPIO4   LRC  -> GPIO5
 *   DIN  -> GPIO6   SD   -> not connected
 *
 * Open source - MIT Licence
 * Electronic Zoology - field notes from the garage
 * https://electroniczoology.com/guides/how-to-play-audio-esp32-max98357a
 */

#include <driver/i2s.h>
#include "clip.h"

// ESP32 Dev Board (38-pin)
#define I2S_BCLK    27
#define I2S_LRCLK   14
#define I2S_DOUT    13
// ESP32-C3 - comment out the three lines above and uncomment these:
//#define I2S_BCLK   4
//#define I2S_LRCLK  5
//#define I2S_DOUT   6

#define SAMPLE_RATE 22050  // must match -ar value used in ffmpeg conversion
#define VOLUME      0.8f   // 0.0 = silent, 1.0 = max

#define BUF_SAMPLES 256
int16_t i2s_buf[BUF_SAMPLES * 2]; // stereo buffer

void setup() {
  i2s_config_t cfg = {
    .mode                 = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_TX),
    .sample_rate          = SAMPLE_RATE,
    .bits_per_sample      = I2S_BITS_PER_SAMPLE_16BIT,
    .channel_format       = I2S_CHANNEL_FMT_RIGHT_LEFT,
    .communication_format = I2S_COMM_FORMAT_STAND_I2S,
    .intr_alloc_flags     = ESP_INTR_FLAG_LEVEL1,
    .dma_buf_count        = 8,
    .dma_buf_len          = 64,
    .use_apll             = false,
    .tx_desc_auto_clear   = true,
  };

  i2s_pin_config_t pins = {
    .bck_io_num   = I2S_BCLK,
    .ws_io_num    = I2S_LRCLK,
    .data_out_num = I2S_DOUT,
    .data_in_num  = I2S_PIN_NO_CHANGE,
  };

  i2s_driver_install(I2S_NUM_0, &cfg, 0, NULL);
  i2s_set_pin(I2S_NUM_0, &pins);
}

void loop() {
  size_t total = audio_len / sizeof(int16_t);
  size_t written;

  for (size_t pos = 0; pos < total; pos += BUF_SAMPLES) {
    size_t count = min((size_t)BUF_SAMPLES, total - pos);
    for (size_t i = 0; i < count; i++) {
      int16_t sample = (int16_t)(audio_data[pos + i] * VOLUME);
      i2s_buf[i * 2]     = sample; // left
      i2s_buf[i * 2 + 1] = sample; // right
    }
    i2s_write(I2S_NUM_0, i2s_buf, count * 2 * sizeof(int16_t), &written, portMAX_DELAY);
  }
  delay(1000);
}
Why not PROGMEM? On ESP32, const arrays are stored in flash automatically by the compiler - no PROGMEM keyword needed. PROGMEM is an AVR legacy keyword; on ESP32 it does nothing. The array lives in flash either way, and you can read it directly without pgm_read_word(). A 2-second clip at 44100 Hz 16-bit is ~176KB - flash handles it, RAM does not.
Tuning the sample rate. Lower rates mean smaller files and less flash used. 8000 Hz works well on many speakers. Run your own tests - what sounds acceptable depends on your speaker and use case. Whatever you set here must match the -ar value in your ffmpeg conversion command.
Volume control: Adjust VOLUME between 0.0 (silent) and 1.0 (full). Samples are scaled in the buffer before being sent to I2S. For hardware volume, tie the GAIN pin directly to VIN for 15dB (loudest), leave floating for 9dB, or tie to GND for 6dB.
Prefer analogue audio? The How to play audio from ESP32 with PAM8403 guide covers a DAC-based alternative - no I2S required.

Troubleshooting

No sound at all

  • Check all three signal wires - BCLK, LRC, and DIN must all be connected and on the correct pins
  • Confirm VIN has power
  • Confirm SD pin is not tied to GND - that shuts the amp down

Very quiet sound

  • Power the module from 5V - 3.3V significantly reduces output power
  • GAIN pin left floating = 9dB. Tie directly to GND for 6dB. Tie directly to VIN for 15dB

Distorted, wrong speed, or wrong pitch

  • The sketch SAMPLE_RATE must match the WAV file - recheck your ffmpeg -ar value
  • Confirm the WAV is 16-bit signed PCM (pcm_s16le) - other formats produce noise or silence

Compile error on i2s_config_t

  • Arduino-ESP32 v3 changed the I2S API - this sketch uses the v2 legacy driver which still compiles but may show deprecation warnings
  • If it fails to compile, check your Arduino-ESP32 board package version under Tools → Boards Manager

Clicks or pops between plays

  • Increase the delay(1000) between plays to give the amp more time to settle

Clip too large - sketch won't fit in flash

  • Trim the clip shorter with ffmpeg - reduce the -to timestamp
  • Drop the sample rate: -ar 22050 halves the file size with minimal quality loss for speech
  • For longer audio, lower the sample rate or use an SD card
Add a display How to wire the GC9A01 round display with ESP32 → - show track info or a waveform on a 240x240 round TFT.
Audio data lives in flash How ESP32 flash memory works → - understand your storage budget before embedding audio clips.
Send audio triggers wirelessly How to use ESP-NOW to connect two ESP32s without a router → - trigger playback from another board with no router.