Re: Reversing a several stage string transformation [SOLVED]
Thank you guys for your help. I've updated the file (link is still valid) with what I ended up doing.
Thank you guys for your help. I've updated the file (link is still valid) with what I ended up doing.
@008 said:
Since it isn't possible to guarantee that the original ciphertext is retrievable ("you only have a sound recording. What now?")
The input text is supposed to be a subset of linguistic interlinear gloss that's in a specific order (at least if the output is to make sense for what it's for) with English content words. It's trivial to apply the cipher to this input and reverse that. The hard part is getting the input back after it's been turned into a string of phonemes (meant to be spoken by a human, or a computer program that can speak with raw IPA/X-SAMPA input.) The phonemes get turned into a written form (such as the Latin or Cyrillic alphabets) as the final output that most people would experience. As far as going the dictionary file route, such files are big and incomplete.
As for assuming things about the input beforehand, since a (potentially silent) "h" comes from a (non-silent in all cases unless you're British) "r" for example, it would make the output difficult to parse by a human at best, complete gibberish at worst.
I have the power to modify the phonetic stages, but since there's a (albeit small) corpus of text using the old system (pretty much tests of the system); they'd be rendered invalid.
Since it isn't possible to guarantee that the original ciphertext is retrievable ("you only have a sound recording. What now?"), I guess I'll have to rewrite the transformation to be lossless. Now the question is what to do with them and try not to disturb the current output too much. Some inputs are obvious (k=>k), some are not (c=>???). And then there's the issue of mapping five vowel letters to 9 sounds.
Under the current system, yes information gets lost; several combinations of input values produce identical output when shoved through the system. A lot (the second and beyond of a group of duplicated consonants, "h" when not before a vowel letter, etc.) simply produce "". In theory it is technically possible for an input to have an infinite number of "h"s allowed after the end of it. I tried getting a little bit of this information back as syllable stress (determined by adding up letters in the cipher string according to an arbitrary value, then applying stress to the syllable val%(number of vowel sounds in output+1), or nowhere if that result is out of range.
@008 said:
So the proposed solution is to basically throw out everything after the cipher stage and rewrite it so that it loses no information.
However, I have a limited output set (limited to what phonemes my voice can readily produce on demand):
a ɛ e ɪ i y u o ɒ ʔ b ʃ d ɸ g h ʒ k l m n p r s t β w x j z= 30
This is my input set:
a b c d e f g h i j k l m n o p q r s t u v w x y z ø = 27
Are you suggesting I treat two or more input tokens as their own token or something?
I forgot one, and the edit button disappeared.
So the proposed solution is to basically throw out everything after the cipher stage and rewrite it so that it loses no information.
However, I have a limited output set (limited to what phonemes my voice can readily produce on demand):
a ɛ e ɪ i y u o ɒ ʔ b ʃ d ɸ g h ʒ k l m n p r s t β w x j = 29
This is my input set:
a b c d e f g h i j k l m n o p q r s t u v w x y z ø = 27
Are you suggesting I treat two or more input tokens as their own token or something?
@blakeyrat said:
Is there anybody on this forum who isn't a furry?
@blakeyrat said:
I really don't know what you're asking of us...
If by "change the regexes so they're reversible" you mean make it so they don't lose information, that's kind of difficult; the output data of the algorithm is supposed to be pronounceable by a human voice and not deviate too far away from the cipher string. Even making (ciphertext) "x" become a vowel sound in some places is pushing it. (Those who have played Star Fox Adventures and looked at the code will know immediately what the cipher is supposed to be.)
@blakeyrat said:
Is the cipher reversible?
Are the regexes reversible?
Is that process reversible?
OK, so I have a program that takes a string, runs it through a cipher. It then takes the output of the cipher, and converts it to a phonetic notation via a series of regular expressions. Then it uses some more regexes to 'clean up' the phonetic string into something pronouncable. Then it turns the phonetic notation into a human-readable string.
Now, the hard part: How do I reverse this process? I've thought about it long and hard, and have no idea how I'm going to do this.
Oh and it's in Java.
The library to do this (code is inside the jar-within-a-jar): https://sites.google.com/site/starfox008/DesktopXlat.zip?attredirects=0&d=1
This is a request for an action plan, don't think I'm just going "do my job and then plz send me teh codez".
@belgariontheking said:
@morbiuswilters said:
@too_many_usernames said:Indeed. I would begin selling iphomes and vi@gra at a significantly reduced cost over their legit counterparts. You may laugh, but companies in non-America (and probably some in America) are dedicated to this very thing. Who hasn'tIf there were no such things as patents, people would probably still spend money on these things, but they would spend less money.A lot less money.seenbought a fake gucci or chanel handbag?
Thankfully, I haven't. (Of course, I've never bought or considered buying a handbag in the first place.)
Bonus WTF: One of the suggestions is the word that just got modified and marked wrong.
@DOA said:
Out of curiocity... if you're trying to process hundrends of thousands of search queries a second is Java the language you want? Not that I've run any benchmarks, but I was under the impression that speed wasn't it's top selling point.
Java's speed isn't its main selling point. However, its speed is reasonable when run as a server app (most of the bottleneck is having to search the filesystem for class files and unzipping jars on application startup. Server apps stay up as long as possible, so this is effectively eliminated.)