Store data in flash, not RAM
PROGMEM is a compiler attribute that tells the ESP32 to store a variable in flash memory instead of RAM. The ESP32 has only 320KB of RAM - a 1-second audio clip at 8kHz is already 8KB, and a few clips would overflow it.
Flash is 4MB and is not used at runtime, so storing audio data there leaves RAM free for the program.