Robin S Posted June 17, 2020 Share Posted June 17, 2020 For a brief moment I was contemplating parsing HTML with regex but then StackOverflow taught me this: https://stackoverflow.com/a/1732454/1036672 My sides are now hurting ? 1 4 Link to comment Share on other sites More sharing options...
Jan Romero Posted June 17, 2020 Share Posted June 17, 2020 That’s a classic, but I’ve been scraping some stuff lately, and I’m not ashamed to say it looks like this and works fine: Regex playlistRegex = new Regex(@"playlist = (\[.*?\]);", RegexOptions.Singleline); Regex titelRegex = new Regex("player-archive-date.*?>(.*?)</div>.*?<span>(.*?)</span>", RegexOptions.Singleline); Regex mp3Regex = new Regex(@"stream_url\s*?=\s*?'(.*?\.mp3)';"); Regex datumRegex = new Regex("datum=([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9])"); //… var i = line.IndexOf("/player/xxx/?xxx=xxx"); if (i == -1) continue; c++; line = line.Substring(i); line = line.Substring(0, line.IndexOf("\");'")); line = line.Replace("&", "&"); var datum = datumRegex.Match(line).Groups[1].Value; *cough* Of course, I can make solid assumptions about my input here. The limited problem of stackoverflow’s OP seems somewhat suitable for regex, too, although I’m unsure what they’re trying to accomplish. Find all non-self-closing opening tags? 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now