It seems like tdom html parsing doesn't work well with partial html
strings that don't necessarily include the full doctype/head/body/etc.
tags. tdom seems to return nodes only for the first tag and not the
rest; meaning that if there are two "<p>" tags in sequence for example,
it processes only the first one.
That is fine if this is the expected behavior but if not, what is the
correct way to do this?
I find that I always have better results with tdom parsing if I use the "-html5" option. Are you using that?
On 4/25/2023 1:34 PM, Ted Nolan <tednolan> wrote:
I find that I always have better results with tdom parsing if I use the
"-html5" option. Are you using that?
no I am not. However, it doesnt recognize this option. I just reviewed
the tdom docs and there wasn't any mention of this option.
For reference, this is what I have:
% package req tdom
0.9.1
% dom parse -html "<p>hello</p> <p>there</p>"
domDoc010BC518
It's a compile option:
It seems like tdom html parsing doesn't work well with partial html
strings that don't necessarily include the full doctype/head/body/etc.
tags. tdom seems to return nodes only for the first tag and not the
rest; meaning that if there are two "<p>" tags in sequence for
example, it processes only the first one.
That is fine if this is the expected behavior but if not, what is the
correct way to do this?
But what DOM tree do you expect to get from that? That document or
fragment doesn't have a single root as HTML or XML have to. So if you
are fine with getting a DOM _forest_ instead of a DOM tree jus to:
package require tdom 0.9.3
dom parse -html -forest "<p>hello</p> <p>there</p>" doc
$doc asXML
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 497 |
Nodes: | 16 (2 / 14) |
Uptime: | 30:13:44 |
Calls: | 9,797 |
Calls today: | 16 |
Files: | 13,749 |
Messages: | 6,188,695 |