如果我使用具有Unicode代码页的html页面运行this code,则结果是乱码,因为在D7中TStringStream不是Unicode.页面可能是UTF8编码或其他(Ansi)代码页编码.
如何检测TStream / IPersistStreamInit是否为Unicode / UTF8 / Ansi?
我如何始终为此函数返回正确的WideString结果?
function GetWebBrowserHTML(const WebBrowser: TWebBrowser): WideString;
如果我用TMemoryStream替换TStringStream,并将TMemoryStream保存到文件中就可以了.它可以是Unicode / UTF8 / Ansi.但我总是希望以WideString的形式返回流:
function GetWebBrowserHTML(const WebBrowser: TWebBrowser): WideString; var // LStream: TStringStream; LStream: TMemoryStream; Stream : IStream; LPersistStreamInit : IPersistStreamInit; begin if not Assigned(WebBrowser.Document) then exit; // LStream := TStringStream.Create(''); LStream := TMemoryStream.Create; try LPersistStreamInit := WebBrowser.Document as IPersistStreamInit; Stream := TStreamAdapter.Create(LStream,soReference); LPersistStreamInit.Save(Stream,true); // result := LStream.DataString; LStream.SaveToFile('c:\test\test.txt'); // test only - file is ok Result := ??? // WideString finally LStream.Free(); end; end;
编辑:我发现这篇文章 – How to load and save documents in TWebBrowser in a Delphi-like way
这完全符合我的需要.但它仅适用于Delphi Unicode编译器(D2009).阅读Conclusion部分:
There is obviously a lot more we could do. A couple of things
immediately spring to mind. We retro-fit some of the Unicode
functionality and support for non-ANSI encodings to the pre-Unicode
compiler code. The present code when compiled with anything earlier
than Delphi 2009 will not save document content to strings correctly
if the document character set is not ANSI.
魔术显然是在TEncoding类(TEncoding.GetBufferEncoding)中.但是D7没有TEncoding.有任何想法吗?
我使用 GpTextStream来处理转换(应该适用于所有Delphi版本):function GetCodePageFromHTMLCharSet(Charset: WideString): Word; const WIN_CHARSET = 'windows-'; ISO_CHARSET = 'iso-'; var S: string; begin Result := 0; if Charset = 'unicode' then Result := CP_UNICODE else if Charset = 'utf-8' then Result := CP_UTF8 else if Pos(WIN_CHARSET, Charset) <> 0 then begin S := Copy(Charset, Length(WIN_CHARSET) + 1, Maxint); Result := StrToIntDef(S, 0); end else if Pos(ISO_CHARSET, Charset) <> 0 then // ISO-8859 (e.g. iso-8859-1: => 28591) begin S := Copy(Charset, Length(ISO_CHARSET) + 1, Maxint); S := Copy(S, Pos('-', S) + 1, 2); if S = '15' then // ISO-8859-15 (Latin 9) Result := 28605 else Result := StrToIntDef('2859' + S, 0); end; end; function GetWebBrowserHTML(WebBrowser: TWebBrowser): WideString; var LStream: TMemoryStream; Stream: IStream; LPersistStreamInit: IPersistStreamInit; TextStream: TGpTextStream; Charset: WideString; Buf: WideString; CodePage: Word; N: Integer; begin Result := ''; if not Assigned(WebBrowser.Document) then Exit; LStream := TMemoryStream.Create; try LPersistStreamInit := WebBrowser.Document as IPersistStreamInit; Stream := TStreamAdapter.Create(LStream, soReference); if Failed(LPersistStreamInit.Save(Stream, True)) then Exit; Charset := (WebBrowser.Document as IHTMLDocument2).charset; CodePage := GetCodePageFromHTMLCharSet(Charset); N := LStream.Size; SetLength(Buf, N); TextStream := TGpTextStream.Create(LStream, tsaccRead, [], CodePage); try N := TextStream.Read(Buf[1], N * SizeOf(WideChar)) div SizeOf(WideChar); SetLength(Buf, N); Result := Buf; finally TextStream.Free; end; finally LStream.Free(); end; end;