Jump to content

Regexp Help!


Pete
 Share

Recommended Posts

Hi guys

I hate regexp - they always give me a headache no matter how many times I try and learn them.

What I want to do is capture the various parts of this:

{showif field=value}some text

over many lines potentially{/showif}

So from that I need the values of field and value and preferably the operator in between them.

I don't need a templating engine (before anyone suggests one :)) as it is literally just the above I want to parse in some text fields in ProcessWire before a page loads in the admin.

Link to comment
Share on other sites

I am not as hopeless as I thought. This:

{showif::field=value}
some text
{/showif}

Can be captured with this:

preg_match_all("/\{showif::(.*?)\}(.*?)\{\/showif\}/is", $input, $matches);

My next trick will be matching if the {showif} tags are optionally surrounded by paragraph tags, but I have a workaround for that so no real need to fix that right now.

  • Like 1
Link to comment
Share on other sites

@Pete

Untested and a little verbose, but you could try this...

$m = array();
$operators = array('=', '!=', ... );              // expand as needed depending on your expected operators
$operators = array_map('preg_quote', $operators); // preg_quote them all
$operators = implode('|', $operators);            // join as alternates

if (preg_match("#\{showif (?P<field>[\w\.]+)(?P<op>$operators)(?P<value>[^}]*)\}(?P<body>([^{]*))\{\/showif\}#i", $selector, $m)) {
    $field = $m['field'];
    $op    = $m['op'];
    $value = $m['value'];
    $body  = $m['body'];

    // You could just access $m directly - but I've made it explicit

} else {
    // No match stuff
}

I might have misunderstood the literals in your example though. If you want to capture possibly multiple occurances in a string then you could use preg_callback or preg_match_all - but I don't know the ins and outs of your use-case so I've left it at a single match.

The actual regex may need a little tweak and can be simplified if you don't need to match anything in the text up to the {/showif}.

  • Like 2
Link to comment
Share on other sites

Techy note...

preg_match_all("/\{showif:.*?)\}(.*?)\{\/showif\}/is", $input, $matches);

If you know the delimiter for any part of an expression then it's faster to tell the regexp parser explicitly when to end a run than have it try to work it out. So this bit...

\{showif:.*?)\}

...can be speeded up by telling it to match until a '}' char...

\{showif:[^}]*)\} 

...and you don't need to worry about embedded line-breaks or greediness then either as that part will stop matching when it hits the first '}' with no backtracks.

Edited to add: Actually the speed-up shouldn't be a lot in this case as you are using an ungreedy match - but it would potentially be big if you had greedy matches.

  • Like 1
Link to comment
Share on other sites

Thanks Steve - much appreciated.

Is there an easy way to see if the whole thing has accidentally surrounded in <p> tags and match that as well? This is all hilariously in a CKEditor field (what could possibly go wrong :D) and it would look a little neater if the {showif} is on its own line.

As for what this is all about - it produced something akin to an MS Word mail-merge, but with inline fields that then optionally show other sections of text that in turn might show their own fields. It's a horrendously long and complex form, but I've realised that the built-in "showif" javascript in the PW admin is very useful for this and just works once you look at the requirements for a div tag surrounding an optional piece of content.

Unfortunately it's nothing I could share as useful to others as it's so tied into this one project, but it would be nice to build a module around this idea in future - it's a relatively nice interactive document builder where the idea was to remove any margin for error over the old version which was a straight Word doc with highlighted text you deleted or added to where applicable. In some ways, now I'm past page 10 of that document, I wish I'd suggested leaving it how it was ;)

Link to comment
Share on other sites

If you don't use it already, try http://regexr.com/ for testing your regexes, I learned a lot about regex with that tool. It only supports JS regex (obviously), so not all regex features that PHP can handle are supported, but still.

If succeeded I save it on regexr and put the URL in a comment within my code, so I can revisit and/or change it later. The great thing is, not only does it save the regex but also the string/text you were testing the regex with.

  • Like 3
Link to comment
Share on other sites

@Pete

HTML tags can be a pain. You could try this...

"#(?<popen>\<p\>)?\{showif (?P<field>[\w\.]+)(?P<op>$operators)(?P<value>[^}]*)\}(?P<body>([^{]*))\{\/showif\}(?P<pclose>\<\/p\>)?#i"

...but I can't recall of the top of my head if $m['popen'] and $m['pclose'] will be unset, null or just an empty string if they do not match.

You may also want to allow for possible linebreaks between the open and the {showif} and between the {/showif} and the close. Finally, will there be stuff like classes on the p tag? If so then you need to use something more like...

(?P<popen>\<p[^>]*\>)?

...to try and match the open tag

Ok, got to go. Will look in on this again this evening.

  • Like 1
Link to comment
Share on other sites

If you don't use it already, try http://regexr.com/ for testing your regexes, I learned a lot about regex with that tool. It only supports JS regex (obviously), so not all regex features that PHP can handle are supported, but still.

If succeeded I save it on regexr and put the URL in a comment within my code, so I can revisit and/or change it later. The great thing is, not only does it save the regex but also the string/text you were testing the regex with.

A friend showed me this https://regex101.com

  • Like 7
Link to comment
Share on other sites

just found another great tool with very different approach!

look at this example of a regex to find emails:

^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$ 

post-2137-0-74813700-1425237874_thumb.pn

http://ysono.github.io/pegrex/#47%2C%5E(%5Ba-z0-9_%5C.-%5D%2B)%40(%5B%5Cda-z%5C.-%5D%2B)%5C.(%5Ba-z%5C.%5D%7B2%2C6%7D)%24

to be honest i don't understand the tool at the bottom, but i love the graphical mockup of the regex :)

  • Like 3
Link to comment
Share on other sites

That's really nice, especially to get through more complex ones you didn't write on your own.

But to be the nitpicker: This regex isn't correct. The part before the @ is in fact technically case sensitive and additionally [a-z] does not include language specific unicode chars. So HansMüller@gmx.de would not be found / validated. The unicode chars wheren't valid before 2012 that's why there are still lots of old regex examples out there, but now one should account for these. I'd think especially older people, wanting to get to know this internet, will happily use those in their emailaddresses if the emailprovider does allow those chars. I've also seen a help thread (i think somewhere on github) where a employee could not register, because his company worked with case sensitive emails, so the lowercase one didn't get to his inbox.

  • Like 1
Link to comment
Share on other sites

 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...