Combining two overlapping regular expressions
-
I have a string containing three consecutive characters at some point. I want to find out if the first two characters match
(?:p[tkfcsxlmnr]|t[pkfcsxlmnr]|k[ptfcslmnr]|f[ptkcsxlmnr]|c[ptkflmnr]|s[ptkfxlmnr]|x[ptfslmnr]|b[dgvjzlmnr]|d[bgvjzlmnr]|g[bdvjzlmnr]|v[bdgjzlmnr]|j[bdgvlmnr]|z[bdgvlmnr]|l[ptkfcsxbdgvjzmnr]|m[ptkfcsxbdgvjlnr]|n[ptkfcsxbdgvjzlmr]|r[ptkfcsxbdgvjzlmn])
and the last two characters match(?:[bcfgkmpsvx][lr]|[cs][fkmnpt]|d[jrz]|[jz][bdgmv]|t[crs])
. Is there some regular expression syntax for that or do I have to manually write the expression?
-
^(firstbit).*(secondbit)$
Where
firstbit
andsecondbit
are your two regexes.
-
The two regular expressions each match two characters, and the middle one should be the same character. I want to take AB and BC and turn it into ABC.
-
Hmm… I'm not sure the regex approach is the right one then. TBH, I think you're better off finding a different way to do the checks.
-
I'm trying to find a string that matches a pattern in the middle of a longer string. Isn't that what regex is for?
-
I think you need to enumerate the rules in human-speak, combining as many cases as possible (for example, lmnr are always valid for the middle character, while p is valid if the first character was t, i, f, c, s, x, l, m, n, or r). Then use back-references to verify them in your regex.
-
I'm trying to find a string that matches a pattern in the middle of a longer string. Isn't that what regex is for?
Hmm…
Tell you what: I'll leave you in the capable hands of @Yamikuronue; sounds like she'll be able to help you better than I can ;)
-
The first one is a consonant pair following these rules:
UNVOICED VOICED p b t d k g f v c j s z x -
- It is forbidden for both consonants to be the same, as this would violate the rule against double consonants.
- It is forbidden for one consonant to be voiced and the other unvoiced. The consonants “l”, “m”, “n”, and “r” are exempt from this restriction. As a result, “bf” is forbidden, and so is “sd”, but both “fl” and “vl”, and both “ls” and “lz”, are permitted.
- It is forbidden for both consonants to be drawn from the set “c”, “j”, “s”, “z”.
- The specific pairs “cx”, “kx”, “xc”, “xk”, and “mz” are forbidden.
The second is one of these:
pl pr fl fr bl br vl vr cp cf ct ck cm cn cl cr jb jv jd jg jm sp sf st sk sm sn sl sr zb zv zd zg zm tc tr ts kl kr dj dr dz gl gr ml mr xl xr
-
([ptkfcsx])(^\1)
Is two non-identical characters
(([ptkfcsx])(?!\1)){2}
I think covers half of rules 1 and 2 (the unvoiced half), but I'd have to test it.
(([ptkfcsx])(?!\1)){2}|(([bdgvjz])(?!\1)){2}
I think would get you halfway there(?!ck|kx|xc|mz)
probably also has to be in there somehow...I need more food but that should help I hope?
-
You don't want {2}. You want to use lookarounds to peek at the second character but not consume it until the second half of the regex, which will be after the bit I was constructing (which will be a bunch of ors in a big paren)
-
...dammit, I think I've been nerd sniped!
-
While you were writing that, I was writing this: http://play.golang.org/p/O9NDlgu0BY
Which gives me this output:
(?:bdj|bdr|bdz|bgl|bgr|bjb|bjd|bjg|bjm|bjv|bml|bmr|bvl|bvr|bzb|bzd|bzg|bzm|bzv|cfl|cfr|ckl|ckr|cml|cmr|cpl|cpr|ctc|ctr|cts|dbl|dbr|dgl|dgr|djb|djd|djg|djm|djv|dml|dmr|dvl|dvr|dzb|dzd|dzg|dzm|dzv|fcf|fck|fcl|fcm|fcn|fcp|fcr|fct|fkl|fkr|fml|fmr|fpl|fpr|fsf|fsk|fsl|fsm|fsn|fsp|fsr|fst|ftc|ftr|fts|fxl|fxr|gbl|gbr|gdj|gdr|gdz|gjb|gjd|gjg|gjm|gjv|gml|gmr|gvl|gvr|gzb|gzd|gzg|gzm|gzv|jbl|jbr|jdj|jdr|jdz|jgl|jgr|jml|jmr|jvl|jvr|kcf|kck|kcl|kcm|kcn|kcp|kcr|kct|kfl|kfr|kml|kmr|kpl|kpr|ksf|ksk|ksl|ksm|ksn|ksp|ksr|kst|ktc|ktr|kts|lbl|lbr|lcf|lck|lcl|lcm|lcn|lcp|lcr|lct|ldj|ldr|ldz|lfl|lfr|lgl|lgr|ljb|ljd|ljg|ljm|ljv|lkl|lkr|lml|lmr|lpl|lpr|lsf|lsk|lsl|lsm|lsn|lsp|lsr|lst|ltc|ltr|lts|lvl|lvr|lxl|lxr|lzb|lzd|lzg|lzm|lzv|mbl|mbr|mcf|mck|mcl|mcm|mcn|mcp|mcr|mct|mdj|mdr|mdz|mfl|mfr|mgl|mgr|mjb|mjd|mjg|mjm|mjv|mkl|mkr|mpl|mpr|msf|msk|msl|msm|msn|msp|msr|mst|mtc|mtr|mts|mvl|mvr|mxl|mxr|nbl|nbr|ncf|nck|ncl|ncm|ncn|ncp|ncr|nct|ndj|ndr|ndz|nfl|nfr|ngl|ngr|njb|njd|njg|njm|njv|nkl|nkr|nml|nmr|npl|npr|nsf|nsk|nsl|nsm|nsn|nsp|nsr|nst|ntc|ntr|nts|nvl|nvr|nxl|nxr|nzb|nzd|nzg|nzm|nzv|pcf|pck|pcl|pcm|pcn|pcp|pcr|pct|pfl|pfr|pkl|pkr|pml|pmr|psf|psk|psl|psm|psn|psp|psr|pst|ptc|ptr|pts|pxl|pxr|rbl|rbr|rcf|rck|rcl|rcm|rcn|rcp|rcr|rct|rdj|rdr|rdz|rfl|rfr|rgl|rgr|rjb|rjd|rjg|rjm|rjv|rkl|rkr|rml|rmr|rpl|rpr|rsf|rsk|rsl|rsm|rsn|rsp|rsr|rst|rtc|rtr|rts|rvl|rvr|rxl|rxr|rzb|rzd|rzg|rzm|rzv|sfl|sfr|skl|skr|sml|smr|spl|spr|stc|str|sts|sxl|sxr|tcf|tck|tcl|tcm|tcn|tcp|tcr|tct|tfl|tfr|tkl|tkr|tml|tmr|tpl|tpr|tsf|tsk|tsl|tsm|tsn|tsp|tsr|tst|txl|txr|vbl|vbr|vdj|vdr|vdz|vgl|vgr|vjb|vjd|vjg|vjm|vjv|vml|vmr|vzb|vzd|vzg|vzm|vzv|xfl|xfr|xml|xmr|xpl|xpr|xsf|xsk|xsl|xsm|xsn|xsp|xsr|xst|xtc|xtr|xts|zbl|zbr|zdj|zdr|zdz|zgl|zgr|zml|zmr|zvl|zvr)
So I guess I can just manually golf-ify it.
-
You're overthinking it, guys.
(?:bdj|bdr|bdz|bgl|bgr|bjb|bjd|bjg|bjm|bjv|bml|bmr|bvl|bvr|bzb|bzd|bzg|bzm|bzv|cfl|cfr|ckl|ckr|cml|cmr|cpl|cpr|ctc|ctr|cts|dbl|dbr|dgl|dgr|djb|djd|djg|djm|djv|dml|dmr|dvl|dvr|dzb|dzd|dzg|dzm|dzv|fcf|fck|fcl|fcm|fcn|fcp|fcr|fct|fkl|fkr|fml|fmr|fpl|fpr|fsf|fsk|fsl|fsm|fsn|fsp|fsr|fst|ftc|ftr|fts|fxl|fxr|gbl|gbr|gdj|gdr|gdz|gjb|gjd|gjg|gjm|gjv|gml|gmr|gvl|gvr|gzb|gzd|gzg|gzm|gzv|jbl|jbr|jdj|jdr|jdz|jgl|jgr|jml|jmr|jvl|jvr|kcf|kck|kcl|kcm|kcn|kcp|kcr|kct|kfl|kfr|kml|kmr|kpl|kpr|ksf|ksk|ksl|ksm|ksn|ksp|ksr|kst|ktc|ktr|kts|lbl|lbr|lcf|lck|lcl|lcm|lcn|lcp|lcr|lct|ldj|ldr|ldz|lfl|lfr|lgl|lgr|ljb|ljd|ljg|ljm|ljv|lkl|lkr|lml|lmr|lpl|lpr|lsf|lsk|lsl|lsm|lsn|lsp|lsr|lst|ltc|ltr|lts|lvl|lvr|lxl|lxr|lzb|lzd|lzg|lzm|lzv|mbl|mbr|mcf|mck|mcl|mcm|mcn|mcp|mcr|mct|mdj|mdr|mdz|mfl|mfr|mgl|mgr|mjb|mjd|mjg|mjm|mjv|mkl|mkr|mpl|mpr|msf|msk|msl|msm|msn|msp|msr|mst|mtc|mtr|mts|mvl|mvr|mxl|mxr|nbl|nbr|ncf|nck|ncl|ncm|ncn|ncp|ncr|nct|ndj|ndr|ndz|nfl|nfr|ngl|ngr|njb|njd|njg|njm|njv|nkl|nkr|nml|nmr|npl|npr|nsf|nsk|nsl|nsm|nsn|nsp|nsr|nst|ntc|ntr|nts|nvl|nvr|nxl|nxr|nzb|nzd|nzg|nzm|nzv|pcf|pck|pcl|pcm|pcn|pcp|pcr|pct|pfl|pfr|pkl|pkr|pml|pmr|psf|psk|psl|psm|psn|psp|psr|pst|ptc|ptr|pts|pxl|pxr|rbl|rbr|rcf|rck|rcl|rcm|rcn|rcp|rcr|rct|rdj|rdr|rdz|rfl|rfr|rgl|rgr|rjb|rjd|rjg|rjm|rjv|rkl|rkr|rml|rmr|rpl|rpr|rsf|rsk|rsl|rsm|rsn|rsp|rsr|rst|rtc|rtr|rts|rvl|rvr|rxl|rxr|rzb|rzd|rzg|rzm|rzv|sfl|sfr|skl|skr|sml|smr|spl|spr|stc|str|sts|sxl|sxr|tcf|tck|tcl|tcm|tcn|tcp|tcr|tct|tfl|tfr|tkl|tkr|tml|tmr|tpl|tpr|tsf|tsk|tsl|tsm|tsn|tsp|tsr|tst|txl|txr|vbl|vbr|vdj|vdr|vdz|vgl|vgr|vjb|vjd|vjg|vjm|vjv|vml|vmr|vzb|vzd|vzg|vzm|vzv|xfl|xfr|xml|xmr|xpl|xpr|xsf|xsk|xsl|xsm|xsn|xsp|xsr|xst|xtc|xtr|xts|zbl|zbr|zdj|zdr|zdz|zgl|zgr|zml|zmr|zvl|zvr)
-
Lol, beaten you by 10 secs.
-
(?:bd(?:j|r|z)|bg(?:l|r)|bj(?:b|d|g|m|v)|bm(?:l|r)|bv(?:l|r)|bz(?:b|d|g|m|v)|cf(?:l|r)|ck(?:l|r)|cm(?:l|r)|cp(?:l|r)|ct(?:c|r|s)|db(?:l|r)|dg(?:l|r)|dj(?:b|d|g|m|v)|dm(?:l|r)|dv(?:l|r)|dz(?:b|d|g|m|v)|fc(?:f|k|l|m|n|p|r|t)|fk(?:l|r)|fm(?:l|r)|fp(?:l|r)|fs(?:f|k|l|m|n|p|r|t)|ft(?:c|r|s)|fx(?:l|r)|gb(?:l|r)|gd(?:j|r|z)|gj(?:b|d|g|m|v)|gm(?:l|r)|gv(?:l|r)|gz(?:b|d|g|m|v)|jb(?:l|r)|jd(?:j|r|z)|jg(?:l|r)|jm(?:l|r)|jv(?:l|r)|kc(?:f|k|l|m|n|p|r|t)|kf(?:l|r)|km(?:l|r)|kp(?:l|r)|ks(?:f|k|l|m|n|p|r|t)|kt(?:c|r|s)|lb(?:l|r)|lc(?:f|k|l|m|n|p|r|t)|ld(?:j|r|z)|lf(?:l|r)|lg(?:l|r)|lj(?:b|d|g|m|v)|lk(?:l|r)|lm(?:l|r)|lp(?:l|r)|ls(?:f|k|l|m|n|p|r|t)|lt(?:c|r|s)|lv(?:l|r)|lx(?:l|r)|lz(?:b|d|g|m|v)|mb(?:l|r)|mc(?:f|k|l|m|n|p|r|t)|md(?:j|r|z)|mf(?:l|r)|mg(?:l|r)|mj(?:b|d|g|m|v)|mk(?:l|r)|mp(?:l|r)|ms(?:f|k|l|m|n|p|r|t)|mt(?:c|r|s)|mv(?:l|r)|mx(?:l|r)|nb(?:l|r)|nc(?:f|k|l|m|n|p|r|t)|nd(?:j|r|z)|nf(?:l|r)|ng(?:l|r)|nj(?:b|d|g|m|v)|nk(?:l|r)|nm(?:l|r)|np(?:l|r)|ns(?:f|k|l|m|n|p|r|t)|nt(?:c|r|s)|nv(?:l|r)|nx(?:l|r)|nz(?:b|d|g|m|v)|pc(?:f|k|l|m|n|p|r|t)|pf(?:l|r)|pk(?:l|r)|pm(?:l|r)|ps(?:f|k|l|m|n|p|r|t)|pt(?:c|r|s)|px(?:l|r)|rb(?:l|r)|rc(?:f|k|l|m|n|p|r|t)|rd(?:j|r|z)|rf(?:l|r)|rg(?:l|r)|rj(?:b|d|g|m|v)|rk(?:l|r)|rm(?:l|r)|rp(?:l|r)|rs(?:f|k|l|m|n|p|r|t)|rt(?:c|r|s)|rv(?:l|r)|rx(?:l|r)|rz(?:b|d|g|m|v)|sf(?:l|r)|sk(?:l|r)|sm(?:l|r)|sp(?:l|r)|st(?:c|r|s)|sx(?:l|r)|tc(?:f|k|l|m|n|p|r|t)|tf(?:l|r)|tk(?:l|r)|tm(?:l|r)|tp(?:l|r)|ts(?:f|k|l|m|n|p|r|t)|tx(?:l|r)|vb(?:l|r)|vd(?:j|r|z)|vg(?:l|r)|vj(?:b|d|g|m|v)|vm(?:l|r)|vz(?:b|d|g|m|v)|xf(?:l|r)|xm(?:l|r)|xp(?:l|r)|xs(?:f|k|l|m|n|p|r|t)|xt(?:c|r|s)|zb(?:l|r)|zd(?:j|r|z)|zg(?:l|r)|zm(?:l|r)|zv(?:l|r))
Definitely not worth it.
-
You have
(?:l|r)
in there a lot. You could probably combine all the things that come before it into one set and then have [lr] after the set.
-
sx(?:l|r)
vg(?:l|r)
nk(?:l|r)
That can definitely be combined further
...how did selecting those bits to quote reply make this happen:
It used to look like:
-
You have (?:l|r) in there a lot. You could probably combine all the things that come before it into one set and then have [lr] after the set.
That can definitely be combined further
Meh. Not worth it. I would keep it simple and easily readable.
...how did selecting those bits to quote reply make this happen:
That's a browser bug, not Discourse this time.
-
So I guess I can just manually golf-ify it.
You could try feeding it to Regexp::Optimizer or something...
-
It wrapped the output in
(?^:(?^:(?^:
…)))
but it's great otherwise.
-
...how did selecting those bits to quote reply make this happen:
Inserting
<span>
s.
-
1. It is forbidden for both consonants to be the same, as this would violate the rule against double consonants.
2. It is forbidden for one consonant to be voiced and the other unvoiced. The consonants “l”, “m”, “n”, and “r” are exempt from this restriction. As a result, “bf” is forbidden, and so is “sd”, but both “fl” and “vl”, and both “ls” and “lz”, are permitted.
3. It is forbidden for both consonants to be drawn from the set “c”, “j”, “s”, “z”.
4. The specific pairs “cx”, “kx”, “xc”, “xk”, and “mz” are forbidden.Weird password requirements you have there...
-