html - Finding a regexp pattern not preceeded by something -


i have following html file structure:

<table>    <tr class="heading">       <td colspan="2">          <h2 class="groupheader">public types</h2>           <!-- don't want that! we're in table.-->       </td>    </tr>    <tr>...</tr>  </table> <h2 class="groupheader">detailed description</h2>   <!-- want until next h2-->   <div class="textblock"><p>provides functions control generation of single data log file. </p>     <h4>example</h4>     <div class="fragment"><div class="line">test <a href="aaa">stuff</a>();</div>         <div class="line">...</div>              <div class="line">...</div>     </div> </div> <!-- end of first result -->  <h2 class="groupheader">member</h2> <!-- want until next h2 or hr--> <a class="anchor"></a> <div class="memitem"> <div class="memproto">       <table class="memname">         <tr>           <td class="memname">enum <a class="el" href="...">test</a></td>         </tr>       </table> </div><div class="memdoc"> <hr><!-- end of 2nd result --> 

and regexp, need content between each titles till next title or hr tag, expect if it's in table.

so far, i've got h2->h2|hr content. goes like:

(?s)(<h2 class="groupheader">.*?)(<h2|<hr) 

how can skip content under h2 contained in table? i've tried noodling negative behind i'm not getting anywhere.

thank help.

note html should parsed appropriate parser

now, since left html-looking input, , task

to content between each titles till next title or hr tag, expect if it's in table

let me show how done.

you can obtain substrings need of tempered greedy token ((?:(?!<\/table|<h2|<hr)(?:<table\b[^<]*>.*?<\/table>|.))*) (that matches symbol not starting of alternatives in negative lookahead before - thus, keeping match within <table> boundaries - , matching inner tables) positive lookahead @ end:

(?s)<h2 class="groupheader">[^<]*<\/h2>\s*((?:(?!<\/table|<h2|<hr)(?:<table\b[^<]*>.*?<\/table>|.))*)(?=<h2|<hr) 

see demo.

note instead of h2 can use h\d+ support level of h.


Comments

Popular posts from this blog

html - Firefox flex bug applied to buttons? -

html - Missing border-right in select on Firefox -

python - build a suggestions list using fuzzywuzzy -