How to play audio from
ESP32 with MAX98357A
What You Need
Parts
- ESP32 Dev Board (38-pin) or ESP32-C3
- MAX98357A I2S amplifier module
- 4-8Ω speaker
- Jumper wires
- USB-C cable for flashing
Software
- Arduino IDE with ESP32 board support
espeak-ng- generate voice clipsffmpeg- convert audio files- Python 3 - WAV to C header
Install on Arch Linux:
Copy and paste into terminal
sudo pacman -S espeak-ng ffmpeg python How It Works
The MAX98357A is a digital I2S amplifier. It receives audio as a digital signal directly from the ESP32's I2S peripheral, converts it to analogue internally, and drives a speaker - all in one chip. No external DAC, no coupling capacitors, no decoupling capacitors. Just three signal wires and power.
I2S (Inter-IC Sound) is a three-wire digital audio bus used between chips. It carries a bit clock (BCLK), a word select line that pulses once per sample to indicate left or right channel (LRCLK), and a serial data line (DIN). The ESP32 generates all three in hardware at whatever sample rate and bit depth you configure. The MAX98357A sits at the other end, clocks the data in, and amplifies it.
The audio data lives in the sketch as a C array of 16-bit signed integers stored in flash memory. The sketch feeds that array to the I2S peripheral over DMA - the CPU hands off the data and the hardware takes care of the rest. No SD card, no filesystem, no streaming complexity for short clips.
The SD pin on the MAX98357A controls shutdown. Left floating, the module is enabled and outputs a mono mix of the left and right I2S channels - useful since we are sending mono audio on the left channel. The GAIN pin left floating sets 9dB of gain, which is enough for most speakers at a reasonable listening volume.
Wiring
| MAX98357A Pin | ESP32 Dev Board | ESP32-C3 | Notes |
|---|---|---|---|
| VIN | 5V | 5V | 5V recommended - 3.3V works at lower volume |
| GND | GND | GND | |
| LRC | GPIO14 | GPIO5 | I2S word select (LRCLK) |
| BCLK | GPIO27 | GPIO4 | I2S bit clock |
| DIN | GPIO13 | GPIO6 | I2S data out |
| SD | Not connected | Not connected | Leave floating - amp enabled, mono mix of L+R |
| GAIN | Not connected | Not connected | Leave floating - 9dB gain |
Connect your speaker to the two output terminals on the MAX98357A module. Polarity only affects phase in a mono setup - either orientation will produce sound. Use a speaker rated 1W or above - the MAX98357A delivers up to 3.2W into 4Ω at 5V, and a 0.5W speaker will distort badly at full gain.
Preparing an Audio Clip
Option 1 - Generate a voice clip with espeak-ng
Copy and paste into terminal
espeak-ng -w /tmp/myclip.wav "your text here" Option 2 - Extract from an existing audio or video file
Use ffmpeg to cut a section from any WAV, MP3, or video file:
Copy and paste into terminal
ffmpeg -i /path/to/source.wav -ss 00:00:05 -to 00:00:08 /tmp/myclip.wav What the flags mean:
-i /path/to/source.wav- your input file./path/to/source.wavis a placeholder - replace it with the actual location of your file on disk (e.g.~/Downloads/mysound.mp3or/home/j/Videos/clip.mp4). Not sure of the path? Drag the file into a terminal window and the full path will be pasted in automatically.-ss 00:00:05- start cutting at 5 seconds-to 00:00:08- stop cutting at 8 seconds (3-second clip)/tmp/myclip.wav- output path./tmpis a temporary folder Linux provides - files survive until reboot
-ss and -to.
Examples:
If the file is in your home folder:
Copy and paste into terminal
ffmpeg -i ~/mysound.wav -ss 00:00:05 -to 00:00:08 /tmp/myclip.wav If the file is in Downloads:
Copy and paste into terminal
ffmpeg -i ~/Downloads/mysound.mp3 -ss 00:00:05 -to 00:00:08 /tmp/myclip.wav If the file is a video - ffmpeg extracts the audio automatically:
Copy and paste into terminal
ffmpeg -i ~/Videos/myvideo.mp4 -ss 00:00:05 -to 00:00:08 /tmp/myclip.wav If the filename has spaces - wrap it in quotes:
Copy and paste into terminal
ffmpeg -i "/home/j/My Audio File.wav" -ss 00:00:05 -to 00:00:08 /tmp/myclip.wav Convert to I2S format
The MAX98357A expects 16-bit signed PCM. Convert your clip to the right format:
Copy and paste into terminal
ffmpeg -y -i /tmp/myclip.wav -ar 22050 -ac 1 -acodec pcm_s16le /tmp/clip_i2s.wav What the flags mean:
-y- overwrite the output file without prompting if it already exists-ar 22050- sample rate in Hz - this value must matchSAMPLE_RATEin the sketch-ac 1- mono-acodec pcm_s16le- 16-bit signed PCM, little-endian - the format I2S and the MAX98357A expect
SAMPLE_RATE in the sketch, or the clip will play at the wrong speed.
WAV to C Header
Save the script below as wav_to_i2s_header.py, then execute it to convert your audio into a header file for the sketch. How to save a script →
import sys, wave, struct
input_file = sys.argv[1] # e.g. /tmp/clip_i2s.wav
output_file = sys.argv[2] # e.g. clip.h
# wave module reads and strips the WAV header automatically -
# readframes() returns raw PCM samples only
with wave.open(input_file, 'rb') as f:
raw = f.readframes(f.getnframes())
# unpack as signed 16-bit little-endian integers (pcm_s16le)
samples = struct.unpack('<' + str(len(raw)//2) + 'h', raw)
lines = [
'#pragma once',
'const int16_t audio_data[] = {',
]
chunks = [str(s) for s in samples]
for i in range(0, len(chunks), 16):
lines.append(' ' + ', '.join(chunks[i:i+16]) + ',')
lines.append('};')
lines.append('const size_t audio_len = sizeof(audio_data);')
with open(output_file, 'w') as f:
f.write('\n'.join(lines) + '\n')
print('Done: ' + str(len(samples)) + ' samples -> ' + output_file) Copy and paste into terminal
python3 wav_to_i2s_header.py /tmp/clip_i2s.wav clip.h Place clip.h in your sketch folder - the same directory as your .ino file.
Sketch
Step 1 - Tone test (confirm wiring first)
Flash this before dealing with audio files. It generates a 440Hz tone via I2S - no clip.h needed. If you hear a tone, wiring and I2S are confirmed working.
/*
* We stand on the shoulders of giants when we build
* with knowledge gained from others' efforts.
* That doesn't make us giants. Be humble.
* Create with care. Open source is the way.
*
* MAX98357A I2S - Tone Test
* --------------------------
* Generates a 440Hz sine wave via I2S to confirm
* wiring and amp are working. No audio file needed.
*
* Board: ESP32 Dev Board (38-pin) or ESP32-C3
* Amp: MAX98357A
*
* Wiring (ESP32 Dev Board):
* VIN -> 5V GND -> GND
* BCLK -> GPIO27 LRC -> GPIO14
* DIN -> GPIO13 SD -> not connected
*
* Wiring (ESP32-C3):
* VIN -> 5V GND -> GND
* BCLK -> GPIO4 LRC -> GPIO5
* DIN -> GPIO6 SD -> not connected
*
* Open source - MIT Licence
* Electronic Zoology - field notes from the garage
* https://electroniczoology.com/guides/how-to-play-audio-esp32-max98357a
*/
#include <driver/i2s.h>
#include <math.h>
// ESP32 Dev Board (38-pin)
#define I2S_BCLK 27
#define I2S_LRCLK 14
#define I2S_DOUT 13
// ESP32-C3 - comment out the three lines above and uncomment these:
//#define I2S_BCLK 4
//#define I2S_LRCLK 5
//#define I2S_DOUT 6
#define SAMPLE_RATE 44100
#define FREQUENCY 440
#define AMPLITUDE 20000
#define BUF_SIZE 256
int16_t buf[BUF_SIZE];
void setup() {
i2s_config_t cfg = {
.mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_TX),
.sample_rate = SAMPLE_RATE,
.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
.channel_format = I2S_CHANNEL_FMT_RIGHT_LEFT,
.communication_format = I2S_COMM_FORMAT_STAND_I2S,
.intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
.dma_buf_count = 8,
.dma_buf_len = 64,
.use_apll = false,
.tx_desc_auto_clear = true,
};
i2s_pin_config_t pins = {
.bck_io_num = I2S_BCLK,
.ws_io_num = I2S_LRCLK,
.data_out_num = I2S_DOUT,
.data_in_num = I2S_PIN_NO_CHANGE,
};
i2s_driver_install(I2S_NUM_0, &cfg, 0, NULL);
i2s_set_pin(I2S_NUM_0, &pins);
}
void loop() {
static uint32_t phase = 0;
size_t written;
for (int i = 0; i < BUF_SIZE; i += 2) {
int16_t sample = (int16_t)(AMPLITUDE * sinf(2.0f * M_PI * FREQUENCY * phase / SAMPLE_RATE));
buf[i] = sample;
buf[i + 1] = sample;
phase++;
if (phase >= SAMPLE_RATE) phase = 0;
}
i2s_write(I2S_NUM_0, buf, sizeof(buf), &written, portMAX_DELAY);
} Step 2 - Play an audio clip
Once the tone test confirms everything is working, swap to this sketch to play your own audio. Place clip.h in the same folder as the .ino file before compiling. The clip plays on startup and loops with a one-second pause between plays.
/*
* We stand on the shoulders of giants when we build
* with knowledge gained from others' efforts.
* That doesn't make us giants. Be humble.
* Create with care. Open source is the way.
*
* MAX98357A I2S Amplifier - ESP32 Dev Board
* ------------------------------------------
* Plays a 16-bit PCM audio clip stored in flash
* via I2S to a MAX98357A amplifier and speaker.
*
* Board: ESP32 Dev Board (38-pin) or ESP32-C3
* Amp: MAX98357A
*
* Wiring (ESP32 Dev Board):
* VIN -> 5V GND -> GND
* BCLK -> GPIO27 LRC -> GPIO14
* DIN -> GPIO13 SD -> not connected
*
* Wiring (ESP32-C3):
* VIN -> 5V GND -> GND
* BCLK -> GPIO4 LRC -> GPIO5
* DIN -> GPIO6 SD -> not connected
*
* Open source - MIT Licence
* Electronic Zoology - field notes from the garage
* https://electroniczoology.com/guides/how-to-play-audio-esp32-max98357a
*/
#include <driver/i2s.h>
#include "clip.h"
// ESP32 Dev Board (38-pin)
#define I2S_BCLK 27
#define I2S_LRCLK 14
#define I2S_DOUT 13
// ESP32-C3 - comment out the three lines above and uncomment these:
//#define I2S_BCLK 4
//#define I2S_LRCLK 5
//#define I2S_DOUT 6
#define SAMPLE_RATE 22050 // must match -ar value used in ffmpeg conversion
#define VOLUME 0.8f // 0.0 = silent, 1.0 = max
#define BUF_SAMPLES 256
int16_t i2s_buf[BUF_SAMPLES * 2]; // stereo buffer
void setup() {
i2s_config_t cfg = {
.mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_TX),
.sample_rate = SAMPLE_RATE,
.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
.channel_format = I2S_CHANNEL_FMT_RIGHT_LEFT,
.communication_format = I2S_COMM_FORMAT_STAND_I2S,
.intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
.dma_buf_count = 8,
.dma_buf_len = 64,
.use_apll = false,
.tx_desc_auto_clear = true,
};
i2s_pin_config_t pins = {
.bck_io_num = I2S_BCLK,
.ws_io_num = I2S_LRCLK,
.data_out_num = I2S_DOUT,
.data_in_num = I2S_PIN_NO_CHANGE,
};
i2s_driver_install(I2S_NUM_0, &cfg, 0, NULL);
i2s_set_pin(I2S_NUM_0, &pins);
}
void loop() {
size_t total = audio_len / sizeof(int16_t);
size_t written;
for (size_t pos = 0; pos < total; pos += BUF_SAMPLES) {
size_t count = min((size_t)BUF_SAMPLES, total - pos);
for (size_t i = 0; i < count; i++) {
int16_t sample = (int16_t)(audio_data[pos + i] * VOLUME);
i2s_buf[i * 2] = sample; // left
i2s_buf[i * 2 + 1] = sample; // right
}
i2s_write(I2S_NUM_0, i2s_buf, count * 2 * sizeof(int16_t), &written, portMAX_DELAY);
}
delay(1000);
} const arrays are stored in flash automatically by the compiler - no PROGMEM keyword needed. PROGMEM is an AVR legacy keyword; on ESP32 it does nothing. The array lives in flash either way, and you can read it directly without pgm_read_word(). A 2-second clip at 44100 Hz 16-bit is ~176KB - flash handles it, RAM does not.
-ar value in your ffmpeg conversion command.
VOLUME between 0.0 (silent) and 1.0 (full). Samples are scaled in the buffer before being sent to I2S. For hardware volume, tie the GAIN pin directly to VIN for 15dB (loudest), leave floating for 9dB, or tie to GND for 6dB.
Troubleshooting
No sound at all
- Check all three signal wires - BCLK, LRC, and DIN must all be connected and on the correct pins
- Confirm VIN has power
- Confirm SD pin is not tied to GND - that shuts the amp down
Very quiet sound
- Power the module from 5V - 3.3V significantly reduces output power
- GAIN pin left floating = 9dB. Tie directly to GND for 6dB. Tie directly to VIN for 15dB
Distorted, wrong speed, or wrong pitch
- The sketch
SAMPLE_RATEmust match the WAV file - recheck your ffmpeg-arvalue - Confirm the WAV is 16-bit signed PCM (
pcm_s16le) - other formats produce noise or silence
Compile error on i2s_config_t
- Arduino-ESP32 v3 changed the I2S API - this sketch uses the v2 legacy driver which still compiles but may show deprecation warnings
- If it fails to compile, check your Arduino-ESP32 board package version under Tools → Boards Manager
Clicks or pops between plays
- Increase the
delay(1000)between plays to give the amp more time to settle
Clip too large - sketch won't fit in flash
- Trim the clip shorter with ffmpeg - reduce the
-totimestamp - Drop the sample rate:
-ar 22050halves the file size with minimal quality loss for speech - For longer audio, lower the sample rate or use an SD card