tl;dr
- Whisper model converts audio to text
- text is passed through subprocess and not sanitized
- difficult to generate a command injection through manual voice
- Need to invert the Neural network that will generate the audio file we need
- Implement Gradient descent based inversion to find input for target output.
- Generate the audio file and send, get flag!