I accidentally forgot to convert some NumPy arrays to bytes objects when using PyAudio, but to my surprise it still played audio, even if it sounded a bit off. I wrote a little test script (see below) for playing 1 second of a 440Hz tone, and it seems like writing a NumPy array directly to a PyAudio
Stream cuts that tone short.
Can anyone explain why this happens? I thought a NumPy array was a contiguous sequence of bytes with some header information about its dtype and strides, so I would’ve predicted that PyAudio played the full second of the tone after some garbled audio from the header, not cut the tone off.
# script segment import pyaudio import numpy as np RATE = 48000 p = pyaudio.PyAudio() stream = p.open(format = pyaudio.paFloat32, channels = 1, rate = RATE, output = True) TONE = 440 SECONDS = 1 t = np.arange(0, 2*np.pi*TONE*SECONDS, 2*np.pi*TONE/RATE) sina = np.sin(t).astype(np.float32) sinb = sina.tobytes() # console commands segment stream.write(sinb) # bytes object plays 1 second of 440Hz tone stream.write(sina) # still plays 440Hz tone, but noticeably shorter than 1 second
The problem is more subtle than you describe. Your first call is passing a bytes array of size 192,000. The second call is passing a list of float32 values with size 48,000.
pyaudio handles both of them, and passes the buffer to
portaudio to be played.
However, when you opened
pyaudio, you told it you were sending
paFloat32 data, which has 4 bytes per sample. The pyaudio
write handler takes the length of the array you gave it, and divides by the number of channels times the sample size to determine how many audio samples there are. In your second call, the length of the array is 48,000, which it divides by 4, and thereby tells
portaudio “there are 12,000 samples here”.
So, everyone understood the format, but were confused about the size. If you change the second call to
then no one has to guess, and it works perfectly fine.