Finding HTML Stuff with Regular Expressions

Following-up my post [cref 69], here’s what I used it for:

<tr\b[^>]*>([\s\S](?!tr\b[^>]*>))*?(String1|String2)[\s\S]*?</tr>

This is an excellent way to find a chunk of HTML with regex. It finds specific table rows containing either “String1″ or “String2″, regardless of linefeeds, carriage returns, or other nefarious forms of whitespace.

Adapted from one of the more simple incarnations in Steve Levithan’s post about evolving a regex to find innermost HTML elements. Could be more complete and/or efficient.. but at least this one is fairly easy (kinda) to understand.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Possibly Related:


2 Responses to “Finding HTML Stuff with Regular Expressions”

  1. Steve Says:

    Very cool. One minor suggestion is to change ([\s\S](?!tr\b[^>]*>))*? to ([\s\S](?</?!tr\b[^>]*>))*? so it will also match some corner cases like <tr>Str<b>String1</b></tr>

    Or is there something I’m missing?

  2. Steve Says:

    Oops, I screwed that up while entering the HTML entities. I meant to change it to ([\s\S](?!</?tr\b[^>]*>))*?

Leave a Reply