Expressive range of text-to-audio models (ESC-50 plots)

Supplemental material for the paper:

This is an expanded version of Figure 6 from the paper, including all 50 labels from the ESC-50 dataset for environmental audio. For each label, there are three plots, showing the loudness, pitch, and timbre of the original (hand-curated) ESC-50 samples for that label, plus 100 samples each generated from three audio models using the prompt "Sound of [label]". The loudness, pitch, and timbre features are high-dimensional and reduced to 2d for plotting using PCA. Therefore the axes are not directly interpretable, but the distance/clustering represents similar or dissimilar audio according to these features. See the section "Expressive Range With ESC-50 Labels" in the paper for methodology.

Note: We realize it would be nice if these plots were interactive, allowing you to click and listen to the samples at each point. All of the audio files are available in the Zenodo archive accompanying the paper, but we have not yet put together a nice web viewer/listener for them.

airplane

Loudness:

Pitch:

Timbre:

breathing

Loudness:

Pitch:

Timbre:

brushing_teeth

Loudness:

Pitch:

Timbre:

can_opening

Loudness:

Pitch:

Timbre:

car_horn

Loudness:

Pitch:

Timbre:

cat

Loudness:

Pitch:

Timbre:

chainsaw

Loudness:

Pitch:

Timbre:

chirping_birds

Loudness:

Pitch:

Timbre:

church_bells

Loudness:

Pitch:

Timbre:

clapping

Loudness:

Pitch:

Timbre:

clock_alarm

Loudness:

Pitch:

Timbre:

clock_tick

Loudness:

Pitch:

Timbre:

coughing

Loudness:

Pitch:

Timbre:

cow

Loudness:

Pitch:

Timbre:

crackling_fire

Loudness:

Pitch:

Timbre:

crickets

Loudness:

Pitch:

Timbre:

crow

Loudness:

Pitch:

Timbre:

crying_baby

Loudness:

Pitch:

Timbre:

dog

Loudness:

Pitch:

Timbre:

door,_wood_creaks

Loudness:

Pitch:

Timbre:

door_knock

Loudness:

Pitch:

Timbre:

drinking,_sipping

Loudness:

Pitch:

Timbre:

engine

Loudness:

Pitch:

Timbre:

fireworks

Loudness:

Pitch:

Timbre:

footsteps

Loudness:

Pitch:

Timbre:

frog

Loudness:

Pitch:

Timbre:

glass_breaking

Loudness:

Pitch:

Timbre:

hand_saw

Loudness:

Pitch:

Timbre:

helicopter

Loudness:

Pitch:

Timbre:

hen

Loudness:

Pitch:

Timbre:

insects_(flying)

Loudness:

Pitch:

Timbre:

keyboard_typing

Loudness:

Pitch:

Timbre:

laughing

Loudness:

Pitch:

Timbre:

mouse_click

Loudness:

Pitch:

Timbre:

pig

Loudness:

Pitch:

Timbre:

pouring_water

Loudness:

Pitch:

Timbre:

rain

Loudness:

Pitch:

Timbre:

rooster

Loudness:

Pitch:

Timbre:

sea_waves

Loudness:

Pitch:

Timbre:

sheep

Loudness:

Pitch:

Timbre:

siren

Loudness:

Pitch:

Timbre:

sneezing

Loudness:

Pitch:

Timbre:

snoring

Loudness:

Pitch:

Timbre:

thunderstorm

Loudness:

Pitch:

Timbre:

toilet_flush

Loudness:

Pitch:

Timbre:

train

Loudness:

Pitch:

Timbre:

vacuum_cleaner

Loudness:

Pitch:

Timbre:

washing_machine

Loudness:

Pitch:

Timbre:

water_drops

Loudness:

Pitch:

Timbre:

wind

Loudness:

Pitch:

Timbre: