Filter everything that doesn't match regex


  • Trolleybus Mechanic

    Maybe I'm having a Monday Brain Fart, but I can't remember if this is even possible.

    In .Net (and, if it'll do it, JavaScript), let's say I have a regex:

    \([0-9]*\)

    So match bracket, any number of digits, bracket.

    Can I filter out anything that DOESN'T match. So if I put in the string

    (999a)

    It would return to me:

    (999)

    ???



  • Heh. I was having this same problem last night but in PCRE.

    EDIT: I brainfarted. What kind of input are you expecting that you want to filter out? Is it never the same kind in the same place?


  • Trolleybus Mechanic

    @rc4 said:

    What kind of input are you expecting that you want to filter out? Is it never the same kind in the same place?

    International phone number formatting. We currently have this janky thing:

    ^((\+){0,1}(\(){0,1}(\d|\s|[(-.)]){0,30}|(\+){0,1}(\(){0,1}(\d|\s|[(-.)]){0,30}(e|E){0,1}(x|X){0,1}(t|T){0,1}(\s|[.])*\d{1,5})$ 
    

    Translation:
    @Lorne_Kates said:

    START OF STRING
    Zero or One (Optional) character +
    Zero or One (Optional) character (
    30 of: digit, space, dash, dot, open round bracket, close bracket (any combination)
    OR
    Optional +
    Optional (
    30 of: digit, space, dash, dot, open round bracket, close bracket (any combination)
    Nothing OR single letter "e" (case insensitive)
    Nothing OR single letter "x" (case insensitive)
    Nothing OR single letter "t" (case insensitive)
    Zero or more: space, dot
    1 to 5 of: digits
    END OF STRING

    But some international users are still finding ways of fucking that up.

    We've decided we'd rather accept any input, filter out anything that doesn't match. So use the regex as a filter rather than a validator. Right now my only solution is:

    Allowed: ([0-9extEXT.- ()+# ]*)*

    (any number of those groupings)

    Then in the server side:

    GoodPhone = String.Empty
    
    For Each match in Regex.Matches(regex, userInput)
        GoodPhone &= match.Value
    Next
    
    If GoodPhone.Trim().Length = 0 Then
    
       Error("Ah shit son you gone done and fucked up")
    
    End If
    
    

    Edit fucking DIscourse and quotes AAAAAAAAAAAA



  • @Lorne_Kates said:

    Can I filter out anything that DOESN'T match.

    So, you want to filter out any characters before the bracket, any non-numbers inside the bracket, and any characters after the bracket?

    I don't think you can do that with just a regex-replace. Can't you iterate over the string and drop the characters manually, setting up an "in-brackets" flag?


  • Trolleybus Mechanic

    @Maciejasjmj said:

    Can't you iterate over the string and drop the characters manually, setting up an "in-brackets" flag?

    Yes, but the idea was to use a regex, since the regex is controlled by the customer.



  • Can you give me a few test cases that should/should not match? I work better when I can smoke test it along the way.



  • Can't you just use the regex to pull out what you want and ignore the rest? Or is that what you're asking how to do? I haven't actually tried.


  • Trolleybus Mechanic

    @rc4 said:

    Can you give me a few test cases that should/should not match? I work better when I can smoke test it along the way.

    Nope, because end users are idiots. All we know is "it don't work sometimes". :|

    @LB_ said:

    Can't you just use the regex to pull out what you want and ignore the rest? Or is that what you're asking how to do? I haven't actually tried.

    That's what I'm trying, but the regex as is doesn't seem to do that.



  • This post is deleted!

  • Grade A Premium Asshole

    @Lorne_Kates said:

    Yes, but the idea was to use a regex, since the regex is controlled by the customer.

    Well, you are going to have problems then, unless those customers are programmers? Can you tell, in vague terms if you must, what domain these customers are in?


  • Trolleybus Mechanic

    @Polygeekery said:

    Well, you are going to have problems then, unless those customers are programmers? Can you tell, in vague terms if you must, what domain these customers are in?

    Ecomm taking international orders. They had a UK user complain the checkout page wouldn't accept their phone number. I tried several UK numbers, couldn't reply. The shopper had already fucked off to another site, so we don't even know what number he was trying.

    My approach is then "fuck it, let them type whatever their little monkey-ham fingers can slam into Teh Webz-- we'll strip out obvious junk. If you get an order that has an invalid phone number, well, the user's username is their email. Contact them"


  • Grade A Premium Asshole

    @Lorne_Kates said:

    Ecomm taking international orders. They had a UK user complain the checkout page wouldn't accept their phone number. I tried several UK numbers, couldn't reply. The shopper had already fucked off to another site, so we don't even know what number he was trying.

    Ahhhhh, yeah, you're going to have a hard time.

    @Lorne_Kates said:

    My approach is then "fuck it, let them type whatever their little monkey-ham fingers can slam into Teh Webz-- we'll strip out obvious junk. If you get an order that has an invalid phone number, well, the user's username is their email. Contact them"

    But that seems like as good an option as one could come up with for the problem.



  • So if I understand what you want to do, you want to take a string like this: 1-999p555-4565 and remove the errant p as it should really be a dash and return the new string as the phone number?

    If I haven't misunderstood your question, there is no elegant way within just regex to achieve this. The primary problem being that is the exact opposite of what regex is designed to do.You want to capture a string with some part of it not captured.

    I think the easiest thing is something like this:
    Doing a replace for all NOT [0-9extEXT.- ()+# ] with null and piping that string into your original regex is the only thing I can think of.


  • Trolleybus Mechanic

    @Dragoon said:

    exact opposite of what regex is designed to do

    If people didn't do the exact opposite of what tools were meant to do, we wouldn't have tdwtf. =P

    @Dragoon said:

    Doing a replace for all NOT [0-9extEXT.- ()+# ] with null and piping that string into your original regex is the only thing I can think of.

    Mostly what I did. I ended up writing the spec as:

    Use this regex

    ([0-9extEXT.-()+# ]+)+
    

    So "any number of groupings of any number of allowable characters in a row".

    Use that as the validator. As long as someone enters at least ONE good character, it will accept it.

    The do this (pseudo code)

    finalPhone = String.Empty
    matches = Regex.Matches(InputTextbox.Text, regex)
    For Each matches as match
       finalPhone &= match.valueorwhatever
    Next
    

    Something in the spec, too, about setting InputTextbox.Text = finalPhone, so the user knows what we filtered down to.

    I bet there's an easier and more elegant way of doing it, but it'd take me longer to think of it, and I'm sure the overall performance savings will be nil vs. nil+0.00001



  • @Lorne_Kates said:

    Use this regex

    ([0-9extEXT.-()+# ]+)+

    Would that be called the "permissive" regex>?

    #0T.5ex...XXX....)--(.)-(.)--(

    This has about a .00000000 percent chance of enforcing a valid phone number.



  • libphonenumber


  • BINNED

    @CoyneTheDup said:

    #0T.5ex...XXX....)--(.)-(.)--(

    This has about a .00000000 percent chance of enforcing a valid phone number.

    You're trying to validate boobs with regex? Of course it will fail!


    Filed under: No, I'm not fixing Discourse's parsing fuckups



  • @Onyx said:

    validate boobs with regex

    Those are four-eyes...wide-open four-eyes. :rolleyes:

    :rofl:


  • ♿ (Parody)

    @CoyneTheDup said:

    Those are four-eyes...wide-open four-eyes. :rolleyes:

    When he blinks, his pupils turn white. It's...kind of creepy.



  • I just implemented something like what it sounds like you're doing as:

        public static string FilterPhoneNumber(string input)
        {
            return Regex.Replace(input, @"\D", "");
        }
    

    That strips all non-digits out of the string.


Log in to reply