With the rise of AI-generated audio, watermarking has become widely used for detecting misuse and protecting intellectual property. However, adversaries may try to remove these watermarks, making it critical to evaluate how well watermarking schemes withstand removal attacks. Existing attacks are often impractical: they either noticeably degrade perceptual quality or require access to the watermarking scheme. We propose DiffErase, a black-box watermark removal attack that assumes no knowledge of the target watermarking scheme while maintaining perceptual quality. DiffErase perturbs watermarked audio to an intermediate diffusion noise level and regenerates it using a pretrained denoising model, effectively suppressing watermark signals. Theoretical analysis and extensive experiments demonstrate that inaudible audio watermarks are highly vulnerable: across multiple audio domains, DiffErase consistently removes watermarks while preserving perceptual quality. These findings highlight the need for future audio watermarking designs to consider diffusion-based threats.
The audio samples are randomly selected from each domain (speech, music, environment). Each sample corresponds to one watermarking method (AudioSeal, WavMark, TimbreWM, Perth, or SilentCipher).
| AudioSeal | WavMark | TimbreWM | Perth | SilentCipher | |
|---|---|---|---|---|---|
| Original Audio |
|
|
|
|
|
| Watermarked Audio |
|
|
|
|
|
| DiffErase-mel |
|
|
|
|
|
| DiffErase-latent |
|
|
|
|
|
| AudioSeal | WavMark | TimbreWM | Perth | SilentCipher | |
|---|---|---|---|---|---|
| Original Audio |
|
|
|
|
|
| Watermarked Audio |
|
|
|
|
|
| DiffErase-mel |
|
|
|
|
|
| DiffErase-latent |
|
|
|
|
|
| AudioSeal | WavMark | TimbreWM | Perth | SilentCipher | |
|---|---|---|---|---|---|
| Original Audio |
|
|
|
|
|
| Watermarked Audio |
|
|
|
|
|
| DiffErase-mel |
|
|
|
|
|
| DiffErase-latent |
|
|
|
|
|