Jump to content

Parsing HTML with regex


Robin S
 Share

Recommended Posts

That’s a classic, but I’ve been scraping some stuff lately, and I’m not ashamed to say it looks like this and works fine:

Regex playlistRegex = new Regex(@"playlist = (\[.*?\]);", RegexOptions.Singleline);
Regex titelRegex = new Regex("player-archive-date.*?>(.*?)</div>.*?<span>(.*?)</span>", RegexOptions.Singleline);
Regex mp3Regex = new Regex(@"stream_url\s*?=\s*?'(.*?\.mp3)';");
Regex datumRegex = new Regex("datum=([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9])");

//…
  
var i = line.IndexOf("/player/xxx/?xxx=xxx");
if (i == -1)
    continue;

c++;

line = line.Substring(i);
line = line.Substring(0, line.IndexOf("\");'"));
line = line.Replace("&", "&");
var datum = datumRegex.Match(line).Groups[1].Value;

*cough*

Of course, I can make solid assumptions about my input here.

The limited problem of stackoverflow’s OP seems somewhat suitable for regex, too, although I’m unsure what they’re trying to accomplish. Find all non-self-closing opening tags?

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...