michelangelo Posted January 8, 2021 Share Posted January 8, 2021 I am doing a very simple string parsing from a url taken from a Textarea inputField. It's all working fine if I define the link as a string, e.g. $str = 'some url', but if it is taken from the CMS it doesn't work... $str = $link->textarea_short; // this doesn't work // Expected Output: '<p>https://www.mixcloud.com/some-radio/</p>' // $str = '<p>https://www.mixcloud.com/some-radio/</p>'; // this works preg_match('/<p>https*:\/\/www\.mixcloud\.com(.*)<\/p>/', $str, $matches, PREG_OFFSET_CAPTURE); $str_url = $matches[1][0]; $str_url = str_replace('/', '%2F', $str_url); echo($str_url); This is the algorithm that I am working with: 1. Getting a url 2. Parsing it to extract what I need 3. Replacing some characters. In both cases if I echo the values I get the right result but in the preg_match() doesn't work. What am I doing wrong? Link to comment Share on other sites More sharing options...
MoritzLost Posted January 8, 2021 Share Posted January 8, 2021 Have you checked if $link->textarea_short actually contains the HTML you assume? If the structure is slightly different, the regular expression will not match anymore. Besides that, there are some issues with your expression: https* – If you want to match either http or https, use a question mark. The asterisk will also match httpssssssss... (.*) – This will match any number of any characters, including the paragraph end tag (</p>). Depending on your content, this will yield unexpected results: If your textarea includes multiple paragraphs, the regular expression will match everything between the start of the link to the last closing paragraphs tag. You can rectify that problem using the U flag (PCRE_UNGREEDY) – preg_match is greedy by default, this will turn it ungreedy. But it may still cause Catastrophic Backtracking. A better solution would be to use [^<]*, which will match any characters except the lesser than sign (<), so it can't "skip" the closing tag. It will still capture any additional content between the link and the closing tag though. What are you trying to do in the first place? Looks like some sort of URL encoding, but why match the <p> tags as well? It would probably be easier to use preg_match_callback with urlencode to find any links and encode them ... Link to comment Share on other sites More sharing options...
michelangelo Posted January 8, 2021 Author Share Posted January 8, 2021 Thanks @MoritzLost! I will fix the issues with my regex now. The $link->textarea_short contains what I expect and it's a string. What I am trying to do is have the client enter a url from mixcloud, e.g. https://www.mixcloud.com/toddyflores/matinee-2015-formula-1-grand-prix-mixtape-by-toddy-flores/ and I will render the appropriate mixcloud iframe player. The iframe uses this structure: <iframe width="100%" height="60" src="https://www.mixcloud.com/widget/iframe/?hide_cover=1&mini=1&feed=%2Ftoddyflores%2Fmatinee-2015-formula-1-grand-prix-mixtape-by-toddy-flores%2F" frameborder="0" ></iframe> so everything in the feed i want to replace with the parsed channel and track. that's why: https://www.mixcloud.com/toddyflores/matinee-2015-formula-1-grand-prix-mixtape-by-toddy-flores/ // becomes ...feed=%2Ftoddyflores%2Fmatinee-2015-formula-1-grand-prix-mixtape-by-toddy-flores%2F... // this expression Why would preg_match() refuse to work with my variable $str? Link to comment Share on other sites More sharing options...
MoritzLost Posted January 8, 2021 Share Posted January 8, 2021 If the textarea contains exactly the same as your test string, preg_match will not work differently. There's probably some additional content in there, maybe some whitespace or even a hidden character from pasting or something like this. I'd try dumping both your test string and the value from the database next to each other and check them for differences. Besides that, the most likely cause are the issues with your regex explained above. I'd also recommend using urlencode or rawurlencode instead of str_replace to encode the URL part. What kind of output are you getting with the value from the database? No match at all, or does it match something it shouldn't? 1 Link to comment Share on other sites More sharing options...
michelangelo Posted January 8, 2021 Author Share Posted January 8, 2021 Thanks @MoritzLost! It worked now. Something was happening with the regex AND with my input. It's all good now! 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now