Re: Reversing a several stage string transformation [SOLVED]
Thank you guys for your help. I've updated the file (link is still valid) with what I ended up doing.
Thank you guys for your help. I've updated the file (link is still valid) with what I ended up doing.
@008 said:
Since it isn't possible to guarantee that the original ciphertext is retrievable ("you only have a sound recording. What now?")
The input text is supposed to be a subset of linguistic interlinear gloss that's in a specific order (at least if the output is to make sense for what it's for) with English content words. It's trivial to apply the cipher to this input and reverse that. The hard part is getting the input back after it's been turned into a string of phonemes (meant to be spoken by a human, or a computer program that can speak with raw IPA/X-SAMPA input.) The phonemes get turned into a written form (such as the Latin or Cyrillic alphabets) as the final output that most people would experience. As far as going the dictionary file route, such files are big and incomplete.
As for assuming things about the input beforehand, since a (potentially silent) "h" comes from a (non-silent in all cases unless you're British) "r" for example, it would make the output difficult to parse by a human at best, complete gibberish at worst.
I have the power to modify the phonetic stages, but since there's a (albeit small) corpus of text using the old system (pretty much tests of the system); they'd be rendered invalid.
Since it isn't possible to guarantee that the original ciphertext is retrievable ("you only have a sound recording. What now?"), I guess I'll have to rewrite the transformation to be lossless. Now the question is what to do with them and try not to disturb the current output too much. Some inputs are obvious (k=>k), some are not (c=>???). And then there's the issue of mapping five vowel letters to 9 sounds.
Under the current system, yes information gets lost; several combinations of input values produce identical output when shoved through the system. A lot (the second and beyond of a group of duplicated consonants, "h" when not before a vowel letter, etc.) simply produce "". In theory it is technically possible for an input to have an infinite number of "h"s allowed after the end of it. I tried getting a little bit of this information back as syllable stress (determined by adding up letters in the cipher string according to an arbitrary value, then applying stress to the syllable val%(number of vowel sounds in output+1), or nowhere if that result is out of range.
@008 said:
So the proposed solution is to basically throw out everything after the cipher stage and rewrite it so that it loses no information.
However, I have a limited output set (limited to what phonemes my voice can readily produce on demand):
a ɛ e ɪ i y u o ɒ ʔ b ʃ d ɸ g h ʒ k l m n p r s t β w x j z= 30
This is my input set:
a b c d e f g h i j k l m n o p q r s t u v w x y z ø = 27
Are you suggesting I treat two or more input tokens as their own token or something?
I forgot one, and the edit button disappeared.
So the proposed solution is to basically throw out everything after the cipher stage and rewrite it so that it loses no information.
However, I have a limited output set (limited to what phonemes my voice can readily produce on demand):
a ɛ e ɪ i y u o ɒ ʔ b ʃ d ɸ g h ʒ k l m n p r s t β w x j = 29
This is my input set:
a b c d e f g h i j k l m n o p q r s t u v w x y z ø = 27
Are you suggesting I treat two or more input tokens as their own token or something?
@blakeyrat said:
Is there anybody on this forum who isn't a furry?
@blakeyrat said:
I really don't know what you're asking of us...
If by "change the regexes so they're reversible" you mean make it so they don't lose information, that's kind of difficult; the output data of the algorithm is supposed to be pronounceable by a human voice and not deviate too far away from the cipher string. Even making (ciphertext) "x" become a vowel sound in some places is pushing it. (Those who have played Star Fox Adventures and looked at the code will know immediately what the cipher is supposed to be.)
@blakeyrat said:
Is the cipher reversible?
Are the regexes reversible?
Is that process reversible?