Wondering where the disconnect is occurring...


  • Notification Spam Recipient

    So we're using a service that (for our intents and purposes) is trying to be Google Now, but conversational. Our usage of this service is primarily Text To Speech and Speech To Text conversion.

    This is provided via https and mostly works.

    Problem is in their TTS engine: It's slow. Like, for every ten seconds of speech audio, it takes 1.5 seconds to generate. This is especially apparent when generating 40 seconds of speech (Yeah, :doing_it_wrong: but that's not important right now). I believe this might be a single-instance-queued type of deal, as evidenced that multiple simultaneous requests will take increasingly longer times to generate.

    Is this unreasonable? For example, I instantiated 50 bots and asked them all what the time was. On our side, all the threads worked fine, opened 50 connections to the service to speak it out (about 1 seconds worth of speech), and then waited. First four came back and started playing the audio, then another ten, then slowly the rest came slogging through until all 50 had either timed out (I had extended the HttpClient's timeout period to one minute) or played.

    In other words, if it so happened that there were 50 bots on the server attempting to answer a time query, some users might end up waiting an excess of a minute to hear the response!!!

    I know it's not our side, because I set up a timed audio playback of a pre-rendered wav file from the system to play from our servers internally, and that works fine (if a bit noisy as the few ms delay from the individual download threads is... fun).

    Thoughts? These guys want several thousand per month for this level of service (and are asking for more, reaching into the 5-digit range now) and I'm wondering if I should draft a proposal to just cut it out from the project and switch to one of the crappy OSS ones readily available...



  • https://developer.chrome.com/extensions/examples/extensions/ttsdemo/ttsdemo.html

    Why do you need an external TTS API? Surely there are a bunch of free libraries you could use...

    STT is a bit harder, but you didn't complain about that at all.


  • Discourse touched me in a no-no place

    @Tsaukpaetra said in Wondering where the disconnect is occurring...:

    switch to one of the crappy OSS ones readily available...

    Hosting your own copy of the OSS ones rather than using someone else's hosting of what is probably the same code? Hmm…


  • Notification Spam Recipient

    @ben_lubar said in Wondering where the disconnect is occurring...:

    Why do you need an external TTS API? Surely there are a bunch of free libraries you could use...

    Because Unreal Engine is unreal sometimes.



  • @Tsaukpaetra If you're on Xbox One, you get speech-to-text "for free", but I think it can only do pre-arranged phrases. (It's used in Ryse: Son of Rome. Not sure if any other games use it.)

    The Kinect stuff, of course, has full Cortana-esque speech-to-text, but it requires a server ping which may or may not work depending on what kind of game it is.


  • Notification Spam Recipient

    @blakeyrat said in Wondering where the disconnect is occurring...:

    If you're on Xbox One, you get speech-to-text "for free",

    We're on Unreal Engine on Windows because Vive. This apparently means c++, which means normal Windows speech recognition (not Cortana) and has barely had much more than a facelift since Windows Vista.

    Doing that would also complicate things, as each player could theoretically need to be a ventriloquist for a bot and complicators gloves would evolve into complicators suits...



  • @Tsaukpaetra said in Wondering where the disconnect is occurring...:

    Is this unreasonable?

    It's clearly extremely unreasonable, unless you never actually plan to support more than 10 simultaneous users.

    These guys want several thousand per month for this level of service (and are asking for more, reaching into the 5-digit range now)

    The appropriate thing here is to call them and ask "WTF are you morons doing". Maybe they can get it fixed. Probably not.

    The "speech recognition+AI assistant" market has boomed recently so I'd expect there to be a few alternatives at least.