Combining two overlapping regular expressions



  • I have a string containing three consecutive characters at some point. I want to find out if the first two characters match (?:p[tkfcsxlmnr]|t[pkfcsxlmnr]|k[ptfcslmnr]|f[ptkcsxlmnr]|c[ptkflmnr]|s[ptkfxlmnr]|x[ptfslmnr]|b[dgvjzlmnr]|d[bgvjzlmnr]|g[bdvjzlmnr]|v[bdgjzlmnr]|j[bdgvlmnr]|z[bdgvlmnr]|l[ptkfcsxbdgvjzmnr]|m[ptkfcsxbdgvjlnr]|n[ptkfcsxbdgvjzlmr]|r[ptkfcsxbdgvjzlmn]) and the last two characters match (?:[bcfgkmpsvx][lr]|[cs][fkmnpt]|d[jrz]|[jz][bdgmv]|t[crs]). Is there some regular expression syntax for that or do I have to manually write the expression?


  • FoxDev

    ^(firstbit).*(secondbit)$

    Where firstbit and secondbit are your two regexes.



  • The two regular expressions each match two characters, and the middle one should be the same character. I want to take AB and BC and turn it into ABC.


  • FoxDev

    Hmm… I'm not sure the regex approach is the right one then. TBH, I think you're better off finding a different way to do the checks.



  • I'm trying to find a string that matches a pattern in the middle of a longer string. Isn't that what regex is for?


  • I survived the hour long Uno hand

    I think you need to enumerate the rules in human-speak, combining as many cases as possible (for example, lmnr are always valid for the middle character, while p is valid if the first character was t, i, f, c, s, x, l, m, n, or r). Then use back-references to verify them in your regex.


  • FoxDev

    @ben_lubar said:

    I'm trying to find a string that matches a pattern in the middle of a longer string. Isn't that what regex is for?

    Hmm…
    Tell you what: I'll leave you in the capable hands of @Yamikuronue; sounds like she'll be able to help you better than I can ;)



  • The first one is a consonant pair following these rules:

    UNVOICED    VOICED
       p          b
       t          d
       k          g
       f          v
       c          j
       s          z
       x          -
    
    1. It is forbidden for both consonants to be the same, as this would violate the rule against double consonants.
    2. It is forbidden for one consonant to be voiced and the other unvoiced. The consonants “l”, “m”, “n”, and “r” are exempt from this restriction. As a result, “bf” is forbidden, and so is “sd”, but both “fl” and “vl”, and both “ls” and “lz”, are permitted.
    3. It is forbidden for both consonants to be drawn from the set “c”, “j”, “s”, “z”.
    4. The specific pairs “cx”, “kx”, “xc”, “xk”, and “mz” are forbidden.

    The second is one of these:

    pl pr                       fl fr
    bl br                       vl vr
    
    cp cf      ct ck cm cn      cl cr
    jb jv      jd jg jm
    sp sf      st sk sm sn      sl sr
    zb zv      zd zg zm
    
    tc tr      ts               kl kr
    dj dr      dz               gl gr
    
    ml mr                       xl xr

  • I survived the hour long Uno hand

    ([ptkfcsx])(^\1) Is two non-identical characters
    (([ptkfcsx])(?!\1)){2} I think covers half of rules 1 and 2 (the unvoiced half), but I'd have to test it.
    (([ptkfcsx])(?!\1)){2}|(([bdgvjz])(?!\1)){2} I think would get you halfway there

    (?!ck|kx|xc|mz) probably also has to be in there somehow...

    I need more food but that should help I hope?


  • I survived the hour long Uno hand

    You don't want {2}. You want to use lookarounds to peek at the second character but not consume it until the second half of the regex, which will be after the bit I was constructing (which will be a bunch of ors in a big paren)


  • I survived the hour long Uno hand

    ...dammit, I think I've been nerd sniped!



  • While you were writing that, I was writing this: http://play.golang.org/p/O9NDlgu0BY

    Which gives me this output: (?:bdj|bdr|bdz|bgl|bgr|bjb|bjd|bjg|bjm|bjv|bml|bmr|bvl|bvr|bzb|bzd|bzg|bzm|bzv|cfl|cfr|ckl|ckr|cml|cmr|cpl|cpr|ctc|ctr|cts|dbl|dbr|dgl|dgr|djb|djd|djg|djm|djv|dml|dmr|dvl|dvr|dzb|dzd|dzg|dzm|dzv|fcf|fck|fcl|fcm|fcn|fcp|fcr|fct|fkl|fkr|fml|fmr|fpl|fpr|fsf|fsk|fsl|fsm|fsn|fsp|fsr|fst|ftc|ftr|fts|fxl|fxr|gbl|gbr|gdj|gdr|gdz|gjb|gjd|gjg|gjm|gjv|gml|gmr|gvl|gvr|gzb|gzd|gzg|gzm|gzv|jbl|jbr|jdj|jdr|jdz|jgl|jgr|jml|jmr|jvl|jvr|kcf|kck|kcl|kcm|kcn|kcp|kcr|kct|kfl|kfr|kml|kmr|kpl|kpr|ksf|ksk|ksl|ksm|ksn|ksp|ksr|kst|ktc|ktr|kts|lbl|lbr|lcf|lck|lcl|lcm|lcn|lcp|lcr|lct|ldj|ldr|ldz|lfl|lfr|lgl|lgr|ljb|ljd|ljg|ljm|ljv|lkl|lkr|lml|lmr|lpl|lpr|lsf|lsk|lsl|lsm|lsn|lsp|lsr|lst|ltc|ltr|lts|lvl|lvr|lxl|lxr|lzb|lzd|lzg|lzm|lzv|mbl|mbr|mcf|mck|mcl|mcm|mcn|mcp|mcr|mct|mdj|mdr|mdz|mfl|mfr|mgl|mgr|mjb|mjd|mjg|mjm|mjv|mkl|mkr|mpl|mpr|msf|msk|msl|msm|msn|msp|msr|mst|mtc|mtr|mts|mvl|mvr|mxl|mxr|nbl|nbr|ncf|nck|ncl|ncm|ncn|ncp|ncr|nct|ndj|ndr|ndz|nfl|nfr|ngl|ngr|njb|njd|njg|njm|njv|nkl|nkr|nml|nmr|npl|npr|nsf|nsk|nsl|nsm|nsn|nsp|nsr|nst|ntc|ntr|nts|nvl|nvr|nxl|nxr|nzb|nzd|nzg|nzm|nzv|pcf|pck|pcl|pcm|pcn|pcp|pcr|pct|pfl|pfr|pkl|pkr|pml|pmr|psf|psk|psl|psm|psn|psp|psr|pst|ptc|ptr|pts|pxl|pxr|rbl|rbr|rcf|rck|rcl|rcm|rcn|rcp|rcr|rct|rdj|rdr|rdz|rfl|rfr|rgl|rgr|rjb|rjd|rjg|rjm|rjv|rkl|rkr|rml|rmr|rpl|rpr|rsf|rsk|rsl|rsm|rsn|rsp|rsr|rst|rtc|rtr|rts|rvl|rvr|rxl|rxr|rzb|rzd|rzg|rzm|rzv|sfl|sfr|skl|skr|sml|smr|spl|spr|stc|str|sts|sxl|sxr|tcf|tck|tcl|tcm|tcn|tcp|tcr|tct|tfl|tfr|tkl|tkr|tml|tmr|tpl|tpr|tsf|tsk|tsl|tsm|tsn|tsp|tsr|tst|txl|txr|vbl|vbr|vdj|vdr|vdz|vgl|vgr|vjb|vjd|vjg|vjm|vjv|vml|vmr|vzb|vzd|vzg|vzm|vzv|xfl|xfr|xml|xmr|xpl|xpr|xsf|xsk|xsl|xsm|xsn|xsp|xsr|xst|xtc|xtr|xts|zbl|zbr|zdj|zdr|zdz|zgl|zgr|zml|zmr|zvl|zvr)

    So I guess I can just manually golf-ify it.



  • You're overthinking it, guys.

    (?:bdj|bdr|bdz|bgl|bgr|bjb|bjd|bjg|bjm|bjv|bml|bmr|bvl|bvr|bzb|bzd|bzg|bzm|bzv|cfl|cfr|ckl|ckr|cml|cmr|cpl|cpr|ctc|ctr|cts|dbl|dbr|dgl|dgr|djb|djd|djg|djm|djv|dml|dmr|dvl|dvr|dzb|dzd|dzg|dzm|dzv|fcf|fck|fcl|fcm|fcn|fcp|fcr|fct|fkl|fkr|fml|fmr|fpl|fpr|fsf|fsk|fsl|fsm|fsn|fsp|fsr|fst|ftc|ftr|fts|fxl|fxr|gbl|gbr|gdj|gdr|gdz|gjb|gjd|gjg|gjm|gjv|gml|gmr|gvl|gvr|gzb|gzd|gzg|gzm|gzv|jbl|jbr|jdj|jdr|jdz|jgl|jgr|jml|jmr|jvl|jvr|kcf|kck|kcl|kcm|kcn|kcp|kcr|kct|kfl|kfr|kml|kmr|kpl|kpr|ksf|ksk|ksl|ksm|ksn|ksp|ksr|kst|ktc|ktr|kts|lbl|lbr|lcf|lck|lcl|lcm|lcn|lcp|lcr|lct|ldj|ldr|ldz|lfl|lfr|lgl|lgr|ljb|ljd|ljg|ljm|ljv|lkl|lkr|lml|lmr|lpl|lpr|lsf|lsk|lsl|lsm|lsn|lsp|lsr|lst|ltc|ltr|lts|lvl|lvr|lxl|lxr|lzb|lzd|lzg|lzm|lzv|mbl|mbr|mcf|mck|mcl|mcm|mcn|mcp|mcr|mct|mdj|mdr|mdz|mfl|mfr|mgl|mgr|mjb|mjd|mjg|mjm|mjv|mkl|mkr|mpl|mpr|msf|msk|msl|msm|msn|msp|msr|mst|mtc|mtr|mts|mvl|mvr|mxl|mxr|nbl|nbr|ncf|nck|ncl|ncm|ncn|ncp|ncr|nct|ndj|ndr|ndz|nfl|nfr|ngl|ngr|njb|njd|njg|njm|njv|nkl|nkr|nml|nmr|npl|npr|nsf|nsk|nsl|nsm|nsn|nsp|nsr|nst|ntc|ntr|nts|nvl|nvr|nxl|nxr|nzb|nzd|nzg|nzm|nzv|pcf|pck|pcl|pcm|pcn|pcp|pcr|pct|pfl|pfr|pkl|pkr|pml|pmr|psf|psk|psl|psm|psn|psp|psr|pst|ptc|ptr|pts|pxl|pxr|rbl|rbr|rcf|rck|rcl|rcm|rcn|rcp|rcr|rct|rdj|rdr|rdz|rfl|rfr|rgl|rgr|rjb|rjd|rjg|rjm|rjv|rkl|rkr|rml|rmr|rpl|rpr|rsf|rsk|rsl|rsm|rsn|rsp|rsr|rst|rtc|rtr|rts|rvl|rvr|rxl|rxr|rzb|rzd|rzg|rzm|rzv|sfl|sfr|skl|skr|sml|smr|spl|spr|stc|str|sts|sxl|sxr|tcf|tck|tcl|tcm|tcn|tcp|tcr|tct|tfl|tfr|tkl|tkr|tml|tmr|tpl|tpr|tsf|tsk|tsl|tsm|tsn|tsp|tsr|tst|txl|txr|vbl|vbr|vdj|vdr|vdz|vgl|vgr|vjb|vjd|vjg|vjm|vjv|vml|vmr|vzb|vzd|vzg|vzm|vzv|xfl|xfr|xml|xmr|xpl|xpr|xsf|xsk|xsl|xsm|xsn|xsp|xsr|xst|xtc|xtr|xts|zbl|zbr|zdj|zdr|zdz|zgl|zgr|zml|zmr|zvl|zvr)
    


  • Lol, beaten you by 10 secs.

    http://repl.it/fou/3



  • http://repl.it/fou/4

    (?:bd(?:j|r|z)|bg(?:l|r)|bj(?:b|d|g|m|v)|bm(?:l|r)|bv(?:l|r)|bz(?:b|d|g|m|v)|cf(?:l|r)|ck(?:l|r)|cm(?:l|r)|cp(?:l|r)|ct(?:c|r|s)|db(?:l|r)|dg(?:l|r)|dj(?:b|d|g|m|v)|dm(?:l|r)|dv(?:l|r)|dz(?:b|d|g|m|v)|fc(?:f|k|l|m|n|p|r|t)|fk(?:l|r)|fm(?:l|r)|fp(?:l|r)|fs(?:f|k|l|m|n|p|r|t)|ft(?:c|r|s)|fx(?:l|r)|gb(?:l|r)|gd(?:j|r|z)|gj(?:b|d|g|m|v)|gm(?:l|r)|gv(?:l|r)|gz(?:b|d|g|m|v)|jb(?:l|r)|jd(?:j|r|z)|jg(?:l|r)|jm(?:l|r)|jv(?:l|r)|kc(?:f|k|l|m|n|p|r|t)|kf(?:l|r)|km(?:l|r)|kp(?:l|r)|ks(?:f|k|l|m|n|p|r|t)|kt(?:c|r|s)|lb(?:l|r)|lc(?:f|k|l|m|n|p|r|t)|ld(?:j|r|z)|lf(?:l|r)|lg(?:l|r)|lj(?:b|d|g|m|v)|lk(?:l|r)|lm(?:l|r)|lp(?:l|r)|ls(?:f|k|l|m|n|p|r|t)|lt(?:c|r|s)|lv(?:l|r)|lx(?:l|r)|lz(?:b|d|g|m|v)|mb(?:l|r)|mc(?:f|k|l|m|n|p|r|t)|md(?:j|r|z)|mf(?:l|r)|mg(?:l|r)|mj(?:b|d|g|m|v)|mk(?:l|r)|mp(?:l|r)|ms(?:f|k|l|m|n|p|r|t)|mt(?:c|r|s)|mv(?:l|r)|mx(?:l|r)|nb(?:l|r)|nc(?:f|k|l|m|n|p|r|t)|nd(?:j|r|z)|nf(?:l|r)|ng(?:l|r)|nj(?:b|d|g|m|v)|nk(?:l|r)|nm(?:l|r)|np(?:l|r)|ns(?:f|k|l|m|n|p|r|t)|nt(?:c|r|s)|nv(?:l|r)|nx(?:l|r)|nz(?:b|d|g|m|v)|pc(?:f|k|l|m|n|p|r|t)|pf(?:l|r)|pk(?:l|r)|pm(?:l|r)|ps(?:f|k|l|m|n|p|r|t)|pt(?:c|r|s)|px(?:l|r)|rb(?:l|r)|rc(?:f|k|l|m|n|p|r|t)|rd(?:j|r|z)|rf(?:l|r)|rg(?:l|r)|rj(?:b|d|g|m|v)|rk(?:l|r)|rm(?:l|r)|rp(?:l|r)|rs(?:f|k|l|m|n|p|r|t)|rt(?:c|r|s)|rv(?:l|r)|rx(?:l|r)|rz(?:b|d|g|m|v)|sf(?:l|r)|sk(?:l|r)|sm(?:l|r)|sp(?:l|r)|st(?:c|r|s)|sx(?:l|r)|tc(?:f|k|l|m|n|p|r|t)|tf(?:l|r)|tk(?:l|r)|tm(?:l|r)|tp(?:l|r)|ts(?:f|k|l|m|n|p|r|t)|tx(?:l|r)|vb(?:l|r)|vd(?:j|r|z)|vg(?:l|r)|vj(?:b|d|g|m|v)|vm(?:l|r)|vz(?:b|d|g|m|v)|xf(?:l|r)|xm(?:l|r)|xp(?:l|r)|xs(?:f|k|l|m|n|p|r|t)|xt(?:c|r|s)|zb(?:l|r)|zd(?:j|r|z)|zg(?:l|r)|zm(?:l|r)|zv(?:l|r))
    

    Definitely not worth it.



  • You have (?:l|r) in there a lot. You could probably combine all the things that come before it into one set and then have [lr] after the set.


  • I survived the hour long Uno hand

    @cartman82 said:

    sx(?:l|r)

    @cartman82 said:

    vg(?:l|r)

    @cartman82 said:

    nk(?:l|r)

    That can definitely be combined further

    ...how did selecting those bits to quote reply make this happen:

    It used to look like:



  • @ben_lubar said:

    You have (?:l|r) in there a lot. You could probably combine all the things that come before it into one set and then have [lr] after the set.

    @Yamikuronue said:

    That can definitely be combined further

    Meh. Not worth it. I would keep it simple and easily readable.

    @Yamikuronue said:

    ...how did selecting those bits to quote reply make this happen:

    That's a browser bug, not Discourse this time.



  • @ben_lubar said:

    So I guess I can just manually golf-ify it.

    You could try feeding it to Regexp::Optimizer or something...



  • It wrapped the output in (?^:(?^:(?^:))) but it's great otherwise.



  • @Yamikuronue said:

    ...how did selecting those bits to quote reply make this happen:

    Inserting <span>s.


  • Discourse touched me in a no-no place

    @ben_lubar said:

    1. It is forbidden for both consonants to be the same, as this would violate the rule against double consonants.
    2. It is forbidden for one consonant to be voiced and the other unvoiced. The consonants “l”, “m”, “n”, and “r” are exempt from this restriction. As a result, “bf” is forbidden, and so is “sd”, but both “fl” and “vl”, and both “ls” and “lz”, are permitted.
    3. It is forbidden for both consonants to be drawn from the set “c”, “j”, “s”, “z”.
    4. The specific pairs “cx”, “kx”, “xc”, “xk”, and “mz” are forbidden.

    Weird password requirements you have there...


  • I survived the hour long Uno hand

    @riking said:

    Inserting <span>s.

    aha! So it IS discourse!


Log in to reply