Python, Linux, and audio

archivator

This is a repost of a thread in another forum that was left unanswered. I believe the problem described below lies either in the library I'm using, or in the methods I'm applying.

Any tips are greatly appreciated.

My newest project is giving me serious trouble. Basically, the application is supposed to receive streaming audio from a socket and play it. In the meantime, it should capture audio and send it through that same socket.

I seem to be lacking the foundations here.

First off, the audio is 8 kHz, 16-bit. That makes it 16KB/sec but at any given time I'm getting around 15.7-15.9 KB/sec from the socket. Naturally, this means that there would be a latency. Only problem is, I've got no idea how to buffer the audio. My approach (see below) is no better than the alternative - i.e., just dump the data from the socket to the playback thread.

Also, pyalsaaudio is a mess. There, I said it. I would gladly dump it in favor of another library but nothing else has nonblocking writes and reads (I'm looking at you, PortAudio!). The period size that the manual talks so much about has absolutely no effect in non-blocking mode and I have no idea what value I should give it.

Here's what I have at the moment. The queues are deque objects used for communication between the main thread and the playback and capture ones. Mind you, I'm not using the capture thread at the moment as I'd like to get playback working first.
    count = 0
    send_buff = r''
    buff = r''
    while 1:
        read, write, emptyy = select.select([client], 
                                            [],
                                            [])
        for socket in read:           
            buff = ''.join((buff,socket.recv(48))) # data always comes in 48 B packets; 
            #also, this is claimed to be faster than buff += socket.recv(48)

            count += 1
            if count % 15 == 0:
                if len(send_buff) > 0:
                    playback_queue.append(send_buff)
                send_buff = '%s' % buff # the send buffer is always 15 packets behind buff 
                buff = r''

        for socket in write:
            # not select'd
            while len(capture_queue) == 0:
                    sleep(0.00001)
            socket.send(capture_queue.pop())
The playback thread is also obvious:
class PlaybackThread(Thread):
    def __init__(self, queue):
        Thread.__init__(self, name="PlaybackThread")
        self.queue = queue
        
        self.stream = alsa.PCM(alsa.PCM_PLAYBACK, alsa.PCM_NONBLOCK)
        self.stream.setrate(8000)
        self.stream.setchannels(1)
        self.stream.setformat(alsa.PCM_FORMAT_S16_LE)
        self.stream.setperiodsize(24) # == 48 bytes
    
    def run(self):
        stream = self.stream # premature optimization FTW!
        queue = self.queue # premature optimization FTW!

        while True:
            while len(queue) == 0: # the thread starts before receiving data
                sleep(0.0000001)
            
            buff = queue.pop()
            if stream.write(buff) == 0:
                print "overrun, %i b not written" % len(buff)
I guess what I'm asking is, how do I get rid of the constant pops I'm hearing (which I assume are sings of buffer underruns)? What am I doing wrong?

Thanks for your time. :)

Nelle

if you get constant buffer underruns, then there is no solution ...

you could try to move the buffering to the consumer :

self.buffer_packets = 50

[...]

    def run(self):
        stream = self.stream # premature optimization FTW!
        queue = self.queue # premature optimization FTW!

        while True:
            while len(queue) < buffer_packets: # the thread starts before receiving data
                sleep(0.0000001)

archivator

@Nelle said:

if you get constant buffer underruns, then there is no solution ..

That's the same conclusion that my amateurish experience in audio development led me to.

Due to the essence of the socket (Bluetooth) and the type of data transferred (mainly voice), I think I can get away with silence-padding. The only problem is that I have no idea how to detect a buffer underrun.

mark = time.time()
end = mark+1
while time.time() <= end:
    < receive data and increment a counter >
if counter < <required number of packets>:
   < add padding >
< play buffer >

Two problems arise from this approach. First, there's the issue of accuracy. Python's time() returns a float and I doubt its reliability in such precise matters. Second, the above code requires a rather large latency. A solution to the first problem would be to increase the margin between `mark' and `end'. That would, however, aggravate the second problem.

Considering the fact that I'm only missing around 50 frames/sec, I don't see why I can't get away with only 6ms of silence.. If only I knew how to achieve it..

Further help is (as always) appreciated.

P.S.: I tried sticking the buffer in the thread. Didn't help.

morbiuswilters

@archivator said:

First, there's the issue of accuracy. Python's time() returns a float and I doubt its reliability in such precise matters. Second, the above code requires a rather large latency. A solution to the first problem would be to increase the margin between `mark' and `end'. That would, however, aggravate the second problem.

Doesn't Python have support for alarms and signals?

archivator

@morbiuswilters said:

Doesn't Python have support for alarms and signals?

Well, it does but I don't see how it's any good in this situation. As far as I can see, alarm() has a precision of 1 second and that's hardly useful in this scenario. It would eventually lead to 1 sec. delay between receiving the data and actually sending it to the audio system.

Unless I'm missing something obvious, that is. Care to elaborate on your idea?

archivator

@morbiuswilters said:

Filed under: Toy language.

You know what? You're right. It *is* a fucking toy. The type you smash to pieces and have fun with the remains while your older brother watches you pitifully.

Though I hate to blame the language when the fault is in the libraries, I feel that a high level language is mostly distinguished for its libraries. Especially if it is so widely spread on the Linux desktop (thanks, Red Hat, you really didn't have to). On the other hand, I think the fault is partly in Python's lack of native threads, so screw apologies. Threads or no threads, though, the libraries still sucked the perfomance out of my application.

alsaaudio, I condemn thee!

I've spent around 30 man hours trying to figure out what the fuck's wrong with this thing. I went nuts in the meantime. Ultimately, I said hell to it all and dug into C++.

Guess what. It worked the first fucking time. Took me less than 3 hours to get a fully working playback-and-capturing app. No multi-threading whatsoever. Screw you, Python!

Also, I noticed something peculiar in the process of rewriting.

You know Linux, right? That thing that's the hype on mobile devices and everyone's so happy about? You also know that most mobile devices nowadays have Bluetooth, right? And Linux's running on those devices, right? And Linux has support for Bluetooth, right?

Well, guess what, [b]no one wrote a single line of documentation for the Bluetooth stack[/b]. It's all in the source code.

I mean, sure, it's a pretty obvious and convenient implementation - just adds an additional type of socket and everything should be clear to someone with experience in networking. But a line or two of official docs wouldn't have hurt. Or at least a document with the common pitfalls. Like a FAQ. Is that too much to ask? I spent 50 minutes trying to figure out a cryptic voice setting. Thanks, Linux, that was really helpful.

So, yeah, this project was a big disappointment. Nevertheless, I have one hell of a useful application ready to go. That counts for something.

verte

@archivator said:for socket in read:           
            buff = ''.join((buff,socket.recv(48))) # data always comes in 48 B packets; 
            #also, this is claimed to be faster than buff += socket.recv(48)

            count += 1
            if count % 15 == 0:
                if len(send_buff) > 0:
                    playback_queue.append(send_buff)
                send_buff = '%s' % buff # the send buffer is always 15 packets behind buff 
                buff = r''


it's interesting that you've noticed that str.join is supposed to be faster that +=.

it is, but not the way you're using it. Both ways involve copying the entire buffer

each time you add to it. This should be obvious to anyone with C experience, because

strcat behaves the same way. The correct thing to do is to build a list of strings,

and join them when you need a result.

None of this matters, however, since you may as well simply wait for all the packets to arrive.

Along the same lines, why are you polling a socket if you're waiting to read from it

before you can do anything anyway? I assume you left out a heap of code, including

whatever you do to capture_queue. The sad part is, it shouldn't take many more lines

to make the proxy complete.

I assume also that you realise that [, is a syntax error, and the forum software is

simply on crack. Also, while 1: is ugly. use while True:

(This is O(len(buff)) rather than O(len(buff) ** 2))

         for s in read:           
            buff = s.recv(720) # don't loop needlessly

            if len(send_buff) > 0:
                playback_queue.append(send_buff)
            send_buff = buff # the send buffer is always 15 packets behind buff 
            buff = '' # no need for raw strings, you have nothing to escape.

Apologies for code mode, my first post :)