TTMSFNCWebBrowser: access HTML code of the displayed web page

Hello,

is it possible to access the HTML code of a displayed web page? Something like OuterHTML in "old" days? Sometimes I need to parse the content of a page for certain items

Many thanks

Hi,

Yes, please look at this thread:

Many thanks, works nicely

1 Like

Good morning,

may I ask some follow-up questions:

  1. unescape is deprecated in Javascript. Is it still the preferred option or would you recommend an alternative?
  2. What is the background for parsing AValue as TJSONObject? As far as I can see you could also directly use the string returned by Javascript?
  3. Would any of the functions from FNCUtils be preferrable in that case?

Many thanks and have a good day
Gernot

Hi,

  1. You could use alternatives, such as decodeURI instead (which replaces unescape)
  2. It makes sure encoding in a string is properly handled. as the body may contain special characters and escaped characters, we used this method. You can ofcourse return a plain string, but string encoding will not be properly handled then. For strings that don't escape characters, you don't need to use the unscape/decodeURI approach.
  3. You could use the TTMSFNCUtils.ParseJSON and handle it at client-level instead of letting the browser handle the encoding. It's unclear what the outcome is because we haven't tested this approach.

TMSFNCEdgeWebBrowser

  TMSFNCEdgeWebBrowser1.ExecuteJavascript(
  '''
    function GetHTML()
    {
      var Link;
      var Title;
      var HTML_Text= '<!DOCTYPE html><html lang="en" xmlns="http://www.w3.org/1999/xhtml"><head><meta charset="utf-8" /><title><title/></head><body>';
      var Elements = document.querySelectorAll('.item-inner__title')

      Elements.forEach(element => {
        const href = element.innerHTML;
        HTML_Text += '<div><p>' + href + '</p></div>';});
      HTML_Text += '</body></html>';
      return decodeURI(HTML_Text);
    } GetHTML();
  '''
  //TMSFNCEdgeWebBrowser1.ExecuteJavascript('function GetHTML(){return unescape(document.documentElement.innerHTML);} GetHTML();',
  ,
  procedure(const AValue: String)

-> Excecute javascript returns this :

'"\u003C!DOCTYPE html>\u003Chtml lang="en" xmlns="http://www.w3.org/1999/xhtml\">\u003Chead>\u003Cmeta charset="utf-8" />\u003Ctitle>\u003Ctitle/>\u003C/head>\u003Cbody>\u003Cdiv>\u003Cp>\u003Ca href ...

Shoud be :

<!DOCTYPE html><html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta charset="utf-8" />
  <title><title/>
</head>
<body>
  <div><p>

is there a workaround to avoid this ??

AdvMemo with HTML Styler freezes with the returned HTML ..

This is an encoding issue.
You might be able to use the following code:

  retrievedHTML := TMSFNCEdgeWebBrowser1.ExecuteJavaScriptSync('document.documentElement.outerHTML');
  EncodedHTML := TJSONObject.ParseJSONValue(TEncoding.UTF8.GetBytes(retrievedHTML ),0).ToString;