Looking for a loose text/pattern matching algorithm



  • I'm writing a MUD engine in PHP/MySQL as a side project, just for fun. I had no prior experience with either system so I wanted to learn, and as a bonus it's pretty freaking easy to host PHP websites.

    I want to support limited chat with NPCs. When an admin configures an NPC instance, he'll be able to supply a table of chat triggers and responses. If a player talks to the NPC and it matches (or nearly matches, this is the part I need help with) a known trigger, then the response is issued. Mostly this will be used for questy-type things, and probably a fair bit of trolling too.

    What should I be using to allow approximate text matches when finding a trigger? I'd like to have a system that can deal with variations in sentence structure, grammar, and misspellings. The alternative is putting the same response in with a bunch of variations in the trigger text.

    I know there are algorithms for this but I've never researched the topic before, just don't know where to start. And in the interest of having some potentially-usable discourse here other than all the Discourse-bashing, I'm asking you guys instead of Google.

    EDIT: After reading through this, part of me really wants to add in an option that will link an NPC up with Cleverbot. Has nothing to do with my question but I'm sure someone here appreciates the thought.


  • :belt_onion:

    @mott555 said:

    What should I be using to allow approximate text matches when finding a trigger? I'd like to have a system that can deal with variations in sentence structure, grammar, and misspellings. The alternative is putting the same response in with a bunch of variations in the trigger text.

    You can do a super simple response system by breaking a sentence into token per word, excluding common words (using ginormugantuan lists from the internet), and then parsing what remains however you want.

    After that point it gets trickier for each level of comparisons.
    Can do minimum Levenshtein distance allowed per word along with exact match to try to allow for misspellings, tenses, conjugations, and other valid grammatical variations on root words.

    Past that... I'm no expert in this so I'd be forced to resort to googling answers too.



  • @darkmatter said:

    You can do a super simple response system by breaking a sentence into token per word, excluding common words (using ginormugantuan lists from the internet), and then parsing what remains however you want.

    This sounds like a good start, and with some tweaking (perhaps combining with some kind of keyword system) might be able to handle similar phrases like "I need a key" and "Where is the key" and "Give me the key".

    Looking at the Levenshtein distance thing, now I'm just inclined to say that if a user can't spell "key" correctly then they don't deserve the key.



  • Are you writing both the front-end and back-end in PHP?



  • PHP for all server-side code and for emitting HTML, MySQL for persistence, and of course HTML/CSS with a dash of JS and JQuery for the client-side stuff. No AJAX-y stuff, any action requires a full postback. Pretty much just a fairly simple old-school web site.

    The World Called Hollow is a pretty strong inspiration for this work. Some friends and I played it quite a bit back in the early 2000's, but they tended to delete inactive accounts after only a week or so which was super-frustrating and I quit.



  • Maybe I'm confused.

    I get that MySQL saves the state of the world, but what's going to be running, say, timed events? Like in a combat situation, the skeleton attacks every 3 seconds, what process is going to handle that? Or are you going to just wait until the next PHP hit and run all the queued-up events then?

    For the record, the best MUD ever made was The Eternal Struggle, sadly defunct.



  • @blakeyrat said:

    Or are you going to just wait until the next PHP hit and run all the queued-up events then?

    Basically this. There isn't much that requires "real-time" processing, and combat will be entirely turn-based. Anything needing a timer remembers the last time it was updated, so on page load things can catch up with a little bit of math.

    It's not a classic MUD architecture like those in the 90's that you connected to with telnet that had all kinds of processes running in the background.


  • :belt_onion:

    @mott555 said:

    Looking at the Levenshtein distance thing, now I'm just inclined to say that if a user can't spell "key" correctly then they don't deserve the key.

    You probably wouldn't want to levenshtein a 3 letter word.
    But if they're being demanded to find a glockenspiel, I suggest some sort of method like levenshtein or a phonetics algorithm.


  • :belt_onion:

    @darkmatter said:

    glockenspiel

    Also, in this scenario I'd accept pretty much anything starting with gl and containing more than 5 letters, like GLSLDJFWPEIOFSNDFPA



  • @darkmatter said:

    >darkmatter said:
    glockenspiel

    Also, in this scenario I'd accept pretty much anything starting with gl and containing more than 5 letters, like GLSLDJFWPEIOFSNDFPA

    GLSLDJFWPEIOFSNDFPA not found; have a bassoon instead.



  • Wordnik API may be able to help.
    http://developer.wordnik.com/docs.html#!/word/getWord_get_1

    For completeness, here's the blog post I heard about it from: http://tinysubversions.com/2013/09/how-to-make-a-twitter-bot/

    And here's a more hi level discussion relating to the thing you're trying to do.

    That's pretty much all I know about that might help, unless you're interested in pos tagging or named entity detection (which I dont know much about either, just wanted to put those terms down in case you wanted something more to google).


  • :belt_onion:

    @HardwareGeek said:

    GLSLDJFWPEIOFSNDFPA not found; have a bassoon instead.

    Shove bassoon up NPC ass.



  • @darkmatter said:

    Shove bassoon up NPC ass.

    Feed NPC beans.



  • Fuck writing parsers yourself, you end up with something as consistent as the shite we're fighting here.

    Also, fuck PHP. Because - well, reasons.

    Use a language that does what you want to do at its core. That's designed to do what you want to do, and written / refined by people far smarter than you.


  • :belt_onion:

    @tufty said:

    Fuck writing parsers yourself, you end up with something as consistent as the shite we're fighting here.

    @mott555 said:

    I'm writing a MUD engine in PHP/MySQL as a side project, just for fun.

    If Jeff had just come up with Dicsourse "just for fun", then I don't think any of us would have a problem with it.

    Sometimes it's more fun to play around on your own (provided you're not intending to release it as v1.0 to the public at large with possibly paid installations).


  • :belt_onion:

    Also, when Buddy is the only one to like something, hilarity ensues:

     



  • :belt_onion:

    @HardwareGeek said:

    Feed NPC beans.

    Feed NPC to baboon before the music starts.



  • @Buddy's avatar actually isn't a white square:


  • :belt_onion:

    nevermind it's just because dicsourse is broken-linking me for no real reason. nothing interesting at all.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.