Looking for a loose text/pattern matching algorithm



  • I'm writing a MUD engine in PHP/MySQL as a side project, just for fun. I had no prior experience with either system so I wanted to learn, and as a bonus it's pretty freaking easy to host PHP websites.

    I want to support limited chat with NPCs. When an admin configures an NPC instance, he'll be able to supply a table of chat triggers and responses. If a player talks to the NPC and it matches (or nearly matches, this is the part I need help with) a known trigger, then the response is issued. Mostly this will be used for questy-type things, and probably a fair bit of trolling too.

    What should I be using to allow approximate text matches when finding a trigger? I'd like to have a system that can deal with variations in sentence structure, grammar, and misspellings. The alternative is putting the same response in with a bunch of variations in the trigger text.

    I know there are algorithms for this but I've never researched the topic before, just don't know where to start. And in the interest of having some potentially-usable discourse here other than all the Discourse-bashing, I'm asking you guys instead of Google.

    EDIT: After reading through this, part of me really wants to add in an option that will link an NPC up with Cleverbot. Has nothing to do with my question but I'm sure someone here appreciates the thought.


  • :belt_onion:

    @mott555 said:

    What should I be using to allow approximate text matches when finding a trigger? I'd like to have a system that can deal with variations in sentence structure, grammar, and misspellings. The alternative is putting the same response in with a bunch of variations in the trigger text.

    You can do a super simple response system by breaking a sentence into token per word, excluding common words (using ginormugantuan lists from the internet), and then parsing what remains however you want.

    After that point it gets trickier for each level of comparisons.
    Can do minimum Levenshtein distance allowed per word along with exact match to try to allow for misspellings, tenses, conjugations, and other valid grammatical variations on root words.

    Past that... I'm no expert in this so I'd be forced to resort to googling answers too.



  • @darkmatter said:

    You can do a super simple response system by breaking a sentence into token per word, excluding common words (using ginormugantuan lists from the internet), and then parsing what remains however you want.

    This sounds like a good start, and with some tweaking (perhaps combining with some kind of keyword system) might be able to handle similar phrases like "I need a key" and "Where is the key" and "Give me the key".

    Looking at the Levenshtein distance thing, now I'm just inclined to say that if a user can't spell "key" correctly then they don't deserve the key.



  • Are you writing both the front-end and back-end in PHP?



  • PHP for all server-side code and for emitting HTML, MySQL for persistence, and of course HTML/CSS with a dash of JS and JQuery for the client-side stuff. No AJAX-y stuff, any action requires a full postback. Pretty much just a fairly simple old-school web site.

    The World Called Hollow is a pretty strong inspiration for this work. Some friends and I played it quite a bit back in the early 2000's, but they tended to delete inactive accounts after only a week or so which was super-frustrating and I quit.



  • Maybe I'm confused.

    I get that MySQL saves the state of the world, but what's going to be running, say, timed events? Like in a combat situation, the skeleton attacks every 3 seconds, what process is going to handle that? Or are you going to just wait until the next PHP hit and run all the queued-up events then?

    For the record, the best MUD ever made was The Eternal Struggle, sadly defunct.



  • @blakeyrat said:

    Or are you going to just wait until the next PHP hit and run all the queued-up events then?

    Basically this. There isn't much that requires "real-time" processing, and combat will be entirely turn-based. Anything needing a timer remembers the last time it was updated, so on page load things can catch up with a little bit of math.

    It's not a classic MUD architecture like those in the 90's that you connected to with telnet that had all kinds of processes running in the background.


  • :belt_onion:

    @mott555 said:

    Looking at the Levenshtein distance thing, now I'm just inclined to say that if a user can't spell "key" correctly then they don't deserve the key.

    You probably wouldn't want to levenshtein a 3 letter word.
    But if they're being demanded to find a glockenspiel, I suggest some sort of method like levenshtein or a phonetics algorithm.


  • :belt_onion:

    @darkmatter said:

    glockenspiel

    Also, in this scenario I'd accept pretty much anything starting with gl and containing more than 5 letters, like GLSLDJFWPEIOFSNDFPA



  • @darkmatter said:

    >darkmatter said:
    glockenspiel

    Also, in this scenario I'd accept pretty much anything starting with gl and containing more than 5 letters, like GLSLDJFWPEIOFSNDFPA

    GLSLDJFWPEIOFSNDFPA not found; have a bassoon instead.



  • Wordnik API may be able to help.

    For completeness, here's the blog post I heard about it from: http://tinysubversions.com/2013/09/how-to-make-a-twitter-bot/

    And here's a more hi level discussion relating to the thing you're trying to do.

    That's pretty much all I know about that might help, unless you're interested in pos tagging or named entity detection (which I dont know much about either, just wanted to put those terms down in case you wanted something more to google).


  • :belt_onion:

    @HardwareGeek said:

    GLSLDJFWPEIOFSNDFPA not found; have a bassoon instead.

    Shove bassoon up NPC ass.



  • @darkmatter said:

    Shove bassoon up NPC ass.

    Feed NPC beans.



  • Fuck writing parsers yourself, you end up with something as consistent as the shite we're fighting here.

    Also, fuck PHP. Because - well, reasons.

    Use a language that does what you want to do at its core. That's designed to do what you want to do, and written / refined by people far smarter than you.


  • :belt_onion:

    @tufty said:

    Fuck writing parsers yourself, you end up with something as consistent as the shite we're fighting here.

    @mott555 said:

    I'm writing a MUD engine in PHP/MySQL as a side project, just for fun.

    If Jeff had just come up with Dicsourse "just for fun", then I don't think any of us would have a problem with it.

    Sometimes it's more fun to play around on your own (provided you're not intending to release it as v1.0 to the public at large with possibly paid installations).


  • :belt_onion:

    Also, when Buddy is the only one to like something, hilarity ensues:

     



  • :belt_onion:

    @HardwareGeek said:

    Feed NPC beans.

    Feed NPC to baboon before the music starts.



  • @Buddy's avatar actually isn't a white square:


  • :belt_onion:

    nevermind it's just because dicsourse is broken-linking me for no real reason. nothing interesting at all.



  • Fun is fun, sure, but ‘simple nlp algorithm’ is the next level up from that xml vs regex meme.


  • 🚽 Regular

    @darkmatter said:

    Also, in this scenario I'd accept pretty much anything starting with gl and containing more than 5 letters, like GLSLDJFWPEIOFSNDFPA
    glDrawElementsInstancedBaseVertexBaseInstance


  • 🚽 Regular


  • BINNED

    > Put gem in mouth.

    > Equip purple dildo.



  • @Zecc said:

    Oo, a comic:

    FunComic.png750x2880 341 KB

    Except you never get to the final 3 images. It's just the first 2 in an endless loop.



  • EDIT, because Discourse ate my first quote
    @tufty said:

    Fuck writing parsers yourself, you end up with something as consistent as the shite we're fighting here.

    Also, fuck PHP. Because - well, reasons.

    Use a language that does what you want to do at its core. That's designed to do what you want to do, and written / refined by people far smarter than you.

    Too late, I already have a PHP codebase built up, though inform looks somewhat interesting if it's not a steaming pile of WTFs and buzzwords.

    @darkmatter said:

    If Jeff had just come up with Dicsourse "just for fun", then I don't think any of us would have a problem with it.

    Sometimes it's more fun to play around on your own (provided you're not intending to release it as v1.0 to the public at large with possibly paid installations).

    My primary purpose with this project is to learn PHP and MySQL. I have lots of web developer experience but it was all with ASP.NET and/or Silverlight. It could have been a basic forum engine instead but there's zero chance of me getting that deployed. At least a MUD I could self-deploy and I know a few people who'd check it out.

    Beyond that, I hope to deploy it to my gaming community and see what happens. There's no profit to be had in writing a MUD engine so I'll release it under a BSD license, and that mostly just so anyone curious could look at it. I highly doubt it will get enough publicity for other people to actually deploy it.

    I have two other guys I'm working with, for testing and sanity checks on usability. One guy knows D&D well, and the other guy has played like 143286s2134akd!4r798 other MUDs in his life. They aren't developers, but they ARE users, and I'm not Jeff enough to fight them if they both think I'm wrong about something.



  • @mott555 said:

    Basically this. There isn't much that requires "real-time" processing, and combat will be entirely turn-based. Anything needing a timer remembers the last time it was updated, so on page load things can catch up with a little bit of math.

    I'm going to make a Blakey-prediction and predict that if you keep this MUD running for longer than, say, a year or so, you're going to end up writing some kind of back-end process to do bookkeeping. You are treading well-trod ground.

    EDIT: actually, now that browsers have websockets, you might be better-off going the other way 'round. Write a traditional MUD back-end, have the front-end talk to it via. web sockets, include some sort of tagging for multimedia elements.



  • Best MUD ever made was 'Legendz', best still existing mud is 'Dark and shattered lands'



  • @Matches said:

    Best MUD ever made was 'Legendz',

    Lies. The Eternal Struggle wins for the following reason: Blakeyrat played The Eternal Struggle.



  • Are you purposely restricting things like ajax? (I can't be bothered to scroll back up) - Ajax would be fine for timer type stuff in most cases.


  • BINNED

    @blakeyrat said:

    Lies. The Eternal Struggle wins for the following reason: The Blakeyrat played The Eternal Struggle.

    FTFY



  • Legendz supported multiclassing, rebirth, had a diety system, remort system, and supernatural system.

    All of which (except multiclassing/rebirth) were restricted and RP enforced (you could get one, but you had to prove you can RP for it)



  • @Matches said:

    Are you purposely restricting things like ajax? (I can't be bothered to scroll back up) - Ajax would be fine for timer type stuff in most cases.

    Erm...AJAX runs on the client, I can't have timer-based features like NPC/item respawns or stat regen run via client code...that's just asking for hacking and abuse.


  • BINNED

    *ducks* Joke! Joke! JOKE!



  • AJAX runs on the client, but it can initiate a POST request to your server - 'HEY! I need new data, my data is stale! Send me the codez'

    Or more accurately, long polling data that the server would push out via a queue/bus/whatever.



  • @Matches said:

    AJAX runs on the client, but it can initiate a POST request to your server - 'HEY! I need new data, my data is stale! Send me the codez'

    Or more accurately, long polling data that the server would push out via a queue/bus/whatever.

    I thought about doing that when I first started, but I've done AJAX-heavy JavaScript apps in the past and let's just say there's a very good reason Discourse often feels slow or buggy. We encountered so many intermittent issues caused by the browser not handling AJAX calls in a timely manner, or the browser not realizing an AJAX call somehow got lost. It's very difficult, if not impossible (as of 2014) to do things right.

    As of now, doing all the work on the server with full-page postbacks is far easier, more reliable, and actually faster and more responsive.



  • @Matches said:

    Legendz supported multiclassing, rebirth, had a diety system, remort system, and supernatural system.

    All of which (except multiclassing/rebirth) were restricted and RP enforced (you could get one, but you had to prove you can RP for it)

    The Eternal Struggle had a completely class-less system with racial and RP additional skill trees (for example, some races had unique skills and if your character were turned into a vampire, for example, you could get new skills.) I leveled a character to the cap, his class? He was a lawyer.

    It had a RP-based system for earning experience and leveling up, at first admin-run but later entirely player-run. (Each player had a quota of Role Play Points to award to other players.) Once the level cap was reached, you could use the extra RPP to "buy" customized items for your character. Sadly that was all manual, never got the time to automate it.

    It had a disguise and recognition system, so you didn't automatically and magically know a person or creature's name like you do in bad RP MUDs. You could "recognize" anything, PC, NPC, animal, whatever. If a player donned a mask or "disguise"-type item, their shortdesc would change and you would no longer recognize them. If you recognized that disguise, say, "creepy hooded man" and then the disguise was worn by another player, you would recognize that other player as "creepy hooded man".

    It had an RP-based combat system, in which each combat action would consist of one or more emotes/say/etc and then a command to tell the MUD to use some combat skill on the other player. The effects of the skill would be applied, and the other player would be expected to emote/say/etc a response.

    It had completely separated in-character and out-of-character environments for chat. You could be talking with a person on an OOC channel and have no idea it was the same person who you were RPing with IC.

    ES was the best.



  • @blakeyrat said:

    The Eternal Struggle had a completely class-less system with racial and RP additional skill trees (for example, some races had unique skills and if your character were turned into a vampire, for example, you could get new skills.) I leveled a character to the cap, his class? He was a lawyer.

    I'm doing something like this. You can do anything you want regardless of class, but some classes will level up certain skills faster than others. If you really want to take our nearly-pure-melee class and only do necromancy, you could...just going to take you a VERY long time to get proficient.

    @blakeyrat said:

    It had a disguise and recognition system, so you didn't automatically and magically know a person or creature's name like you do in bad RP MUDs. You could "recognize" anything, PC, NPC, animal, whatever. If a player donned a mask or "disguise"-type item, their shortdesc would change and you would no longer recognize them. If you recognized that disguise, say, "creepy hooded man" and then the disguise was worn by another player, you would recognize that other player as "creepy hooded man".

    Very interesting. We've already plotted out a basic "stealth" system giving you the chance to move around unnoticed, hide from players, and have a higher chance for critical strike if you attack someone who doesn't "know" you're there, but disguises would add a whole new element to that.



  • Well the recognition system (which was actually implemented before we came up with disguises) is the important thing. It always annoys me in games, especially RP games, where you see a person and you instantly know their name before you've even spoken to them.



  • How does one "recognize" another player then? I'm intrigued but I certainly don't want to put in a magical command that suddenly makes you know everything about a player character. I also don't want to make things too confusing if there's a clan war going on.



  • @mott555 said:

    How does one "recognize" another player then?

    A dark elf with piercing eyes enters the room

    LOOK Piercing

    This dark elf's eyes are piercing. Hopefully his description is a little bit more interesting than this in practice.

    SAY Hello, who are you?

    You say, "Hello, who are you?"

    A dark elf with piercing eyes says, "My name is Bob Dole."

    RECOGNIZE Piercing "Bob Dole"

    You recognize "a dark elf with piercing eyes" as "Bob Dole"

    Bob Dole says, "Republicans rock!"

    You don't "suddenly know everything about a player character". The game engine will tell you his sex and race. The character's description will give you a detailed physical description of the character. Anything else is information they volunteer.

    The biggest challenge as you can tell from this is getting the player to make a shortdesc with enough useful keywords so people can easily interact with him. Admins had to fix quite a few bad shortdescs.

    EDIT: There was also a "recoglist" command which lists people/things you recognized, and of course you could "forget" a person and no longer recognize them. I argued that a thing should be automatically forgotten if you didn't see him in like 3 months, but that never ended up happening.

    EDIT: I should also mention that ES was really, really, HEAVILY, RP-based. It didn't have things like "clan wars". We considered it closer to collaborative novel-writing than to an action game.



  • @blakeyrat said:

    EDIT: I should also mention that ES was really, really, HEAVILY, RP-based. It didn't have things like "clan wars". We considered it closer to collaborative novel-writing than to an action game.

    Right now mine's pretty opposite. It's geared to be more of a hack-and-slash text combat system, but I'm hoping the clans system helps encourage some RP to explain why clans may be at war with each other, or how they recruit, and such. I'm also hoping clans might work with the admins on world design so in a way they become part of the lore and might even have their own kingdoms.

    I want to add more RP-like elements but my MUD experience is sadly limited to either hack-and-slash MUDs, or MUDs that didn't really work, so I don't have much experience to draw from. TRWTF is probably me writing a MUD....perhaps I should open up my forum to the Lounge here in case any TDWTFers want to help or make suggestions, but it's under my gamertag name and so far I've avoided linking my gamertag to my real name (Mott is my real life last name).



  • @blakeyrat said:

    You recognize "a dark elf with piercing eyes" as "Bob Dole"
    Bob Dole says, "Republicans rock!"

    I can't imagine a dark elf saying that. Only white elves say that.



  • @blakeyrat said:

    He was a lawyer.

    Why does this not surprise me in the least?


  • Discourse touched me in a no-no place

    @mott555 said:

    I thought about doing that when I first started, but I've done AJAX-heavy JavaScript apps in the past and let's just say there's a very good reason Discourse often feels slow or buggy. We encountered so many intermittent issues caused by the browser not handling AJAX calls in a timely manner, or the browser not realizing an AJAX call somehow got lost. It's very difficult, if not impossible (as of 2014) to do things right.

    As of now, doing all the work on the server with full-page postbacks is far easier, more reliable, and actually faster and more responsive.

    Definitely have a look at websockets; they're quite a lot more responsive than AJAX.


  • BINNED

    @dkf said:

    Definitely have a look at websockets; they're quite a lot more responsive than AJAX.

    And halluva easier to write readable code than with polling. At least for me.



  • Just not in PHP.


  • Discourse touched me in a no-no place

    So… perhaps PHP needs updating to support them?



  • @dkf said:

    So… perhaps PHP needs updating to support them?

    I don't think you understand how PHP works and why, generally, it wouldn't be possible without running PHP as the front end webserver.


  • Discourse touched me in a no-no place

    @Arantor said:

    I don't think you understand how PHP works and why, generally, it wouldn't be possible without running PHP as the front end webserver.

    You might be surprised what I understand. I think this would be a valuable (but non-trivial!) addition to what PHP can do. Or you (collective “PHP community” you here; not picking on you specifically) can stick with your current limitations and watch the world pass you by. Stagnation is always an option.


Log in to reply