Is there an existing class in C# that can convert Quoted-Printable encoding to String? Click on the above link to get more information on the encoding.
c#中是否存在可以将可引用的可打印编码转换为字符串的现有类?单击上面的链接以获得关于编码的更多信息。
The following is quoted from the above link for your convenience.
为了您的方便,以下引用自上面的链接。
Any 8-bit byte value may be encoded with 3 characters, an "=" followed by two hexadecimal digits (0–9 or A–F) representing the byte's numeric value. For example, a US-ASCII form feed character (decimal value 12) can be represented by "=0C", and a US-ASCII equal sign (decimal value 61) is represented by "=3D". All characters except printable ASCII characters or end of line characters must be encoded in this fashion.
任何8位字节的值都可以用3个字符编码,“=”后面跟着两个十六进制数字(0-9或A-F),表示字节的数值。例如,一个US-ASCII表单提要字符(decimal值12)可以用“=0C”表示,而一个US-ASCII等号(decimal值61)由“=3D”表示。除了可打印的ASCII字符或行尾字符之外的所有字符都必须按照这种方式进行编码。
All printable ASCII characters (decimal values between 33 and 126) may be represented by themselves, except "=" (decimal 61).
所有可打印的ASCII字符(33到126之间的十进制值)可以由它们自己表示,除了“=”(十进制61)。
ASCII tab and space characters, decimal values 9 and 32, may be represented by themselves, except if these characters appear at the end of a line. If one of these characters appears at the end of a line it must be encoded as "=09" (tab) or "=20" (space).
ASCII制表符和空格字符(十进制值9和32)可以由它们自己表示,除非这些字符出现在一行的末尾。如果其中一个字符出现在一行的末尾,则必须将其编码为“=09”(选项卡)或“=20”(空格)。
If the data being encoded contains meaningful line breaks, they must be encoded as an ASCII CR LF sequence, not as their original byte values. Conversely if byte values 13 and 10 have meanings other than end of line then they must be encoded as =0D and =0A.
如果被编码的数据包含有意义的换行符,则必须将它们编码为ASCII CR LF序列,而不是原始字节值。相反,如果字节值13和10的含义不是行尾,那么它们必须被编码为=0D和=0A。
Lines of quoted-printable encoded data must not be longer than 76 characters. To satisfy this requirement without altering the encoded text, soft line breaks may be added as desired. A soft line break consists of an "=" at the end of an encoded line, and does not cause a line break in the decoded text.
可引用的可打印编码数据行不得超过76个字符。为了在不改变编码文本的情况下满足这一要求,可以根据需要添加软换行符。软线中断包括在编码行的末尾的“=”,并且不会在解码文本中导致断行。
12 个解决方案
#1
19
There is functionality in the framework libraries to do this, but it doesn't appear to be cleanly exposed. The implementation is in the internal class System.Net.Mime.QuotedPrintableStream. This class defines a method called DecodeBytes which does what you want. The method appears to be used by only one method which is used to decode MIME headers. This method is also internal, but is called fairly directly in a couple of places, e.g., the Attachment.Name setter. A demonstration:
框架库中有这样做的功能,但它看起来并没有完全暴露。实现在内部类system . net . mi . quotedprintablestream中。这个类定义了一个叫做DecodeBytes的方法,它可以完成您想要的工作。该方法似乎只被用于解码MIME标题的一种方法。这个方法也是内部的,但是在一些地方被直接调用,例如附件。名字setter。一个示范:
using System;using System.Net.Mail;namespace ConsoleApplication1{ class Program { static void Main(string[] args) { Attachment attachment = Attachment.CreateAttachmentFromString("", "=?iso-8859-1?Q?=A1Hola,_se=F1or!?="); Console.WriteLine(attachment.Name); } }}Produces the output:
生成的输出:
¡Hola,_señor!
¡你好,_senor !
You may have to do some testing to ensure carriage returns, etc are treated correctly although in a quick test I did they seem to be. However, it may not be wise to rely on this functionality unless your use-case is close enough to decoding of a MIME header string that you don't think it will be broken by any changes made to the library. You might be better off writing your own quoted-printable decoder.
您可能需要做一些测试,以确保回车,等等被正确对待,尽管在一个快速测试中,我似乎是这样做的。但是,除非您的用例足够接近于解码一个MIME标题字符串,您认为它不会被对库的任何更改所破坏,否则依赖此功能可能是不明智的。您最好编写自己的可引用的可打印解码器。
#2
16
I extended the solution of Martin Murphy and I hope it will work in every case.
我扩展了马丁·墨菲的解决方案,我希望它在任何情况下都能奏效。
private static string DecodeQuotedPrintables(string input, string charSet) { if (string.IsNullOrEmpty(charSet)) { var charSetOccurences = new Regex(@"=\?.*\?Q\?", RegexOptions.IgnoreCase); var charSetMatches = charSetOccurences.Matches(input); foreach (Match match in charSetMatches) { charSet = match.Groups[0].Value.Replace("=?", "").Replace("?Q?", ""); input = input.Replace(match.Groups[0].Value, "").Replace("?=", ""); } } Encoding enc = new ASCIIEncoding(); if (!string.IsNullOrEmpty(charSet)) { try { enc = Encoding.GetEncoding(charSet); } catch { enc = new ASCIIEncoding(); } } //decode iso-8859-[0-9] var occurences = new Regex(@"=[0-9A-Z]{2}", RegexOptions.Multiline); var matches = occurences.Matches(input); foreach (Match match in matches) { try { byte[] b = new byte[] { byte.Parse(match.Groups[0].Value.Substring(1), System.Globalization.NumberStyles.AllowHexSpecifier) }; char[] hexChar = enc.GetChars(b); input = input.Replace(match.Groups[0].Value, hexChar[0].ToString()); } catch { ;} } //decode base64String (utf-8?B?) occurences = new Regex(@"\?utf-8\?B\?.*\?", RegexOptions.IgnoreCase); matches = occurences.Matches(input); foreach (Match match in matches) { byte[] b = Convert.FromBase64String(match.Groups[0].Value.Replace("?utf-8?B?", "").Replace("?UTF-8?B?", "").Replace("?", "")); string temp = Encoding.UTF8.GetString(b); input = input.Replace(match.Groups[0].Value, temp); } input = input.Replace("=\r\n", ""); return input; }#3
5
I wrote this up real quick.
我写得很快。
public static string DecodeQuotedPrintables(string input) { var occurences = new Regex(@"=[0-9A-H]{2}", RegexOptions.Multiline); var matches = occurences.Matches(input); foreach (Match match in matches) { char hexChar= (char) Convert.ToInt32(match.Groups[0].Value.Substring(1), 16); input =input.Replace(match.Groups[0].Value, hexChar.ToString()); } return input.Replace("=\r\n", ""); }#4
4
If you are decoding quoted-printable with UTF-8 encoding you will need to be aware that you cannot decode each quoted-printable sequence one-at-a-time as the others have shown if there are runs of quoted printable characters together.
如果您正在用UTF-8编码解码可引用的可引用打印序列,那么您需要知道,如果有一组可引用的可引用可打印字符同时运行,那么您不能一次解码每个可引用可引用的可引用打印序列。
For example - if you have the following sequence =E2=80=99 and decode this using UTF8 one-at-a-time you get three "weird" characters - if you instead build an array of three bytes and convert the three bytes with the UTF8 encoding you get a single aphostrope.
例如,如果您有以下序列=E2=80=99,并使用UTF8一次解码,您将得到三个“奇怪”字符——如果您构建一个由三个字节组成的数组并使用UTF8编码转换三个字节,您将得到一个aphostrope。
Obviously if you are using ASCII encoding then one-at-a-time is no problem however decoding runs means your code will work regardless of the text encoder used.
显然,如果您正在使用ASCII编码,那么一次一次进行一次编码是没有问题的,但是解码运行意味着无论使用什么文本编码器,您的代码都将工作。
Oh and don't forget =3D is a special case that means you need to decode whatever you have one more time... That is a crazy gotcha!
哦,别忘了=3D是一种特殊情况,意味着你需要再解码一次你想要的东西……这是一个疯狂的陷阱!
Hope that helps
希望这有助于
#5
2
This Quoted Printable Decoder works great!
这个引用的可打印解码器工作得很好!
public static byte[] FromHex(byte[] hexData) { if (hexData == null) { throw new ArgumentNullException("hexData"); } if (hexData.Length <2 || (hexData.Length / (double)2 != Math.Floor(hexData.Length / (double)2))) { throw new Exception("Illegal hex data, hex data must be in two bytes pairs, for example: 0F,FF,A3,... ."); } MemoryStream retVal = new MemoryStream(hexData.Length / 2); // Loop hex value pairs for (int i = 0; i2
private string quotedprintable(string data, string encoding) { data = data.Replace("=\r\n", ""); for (int position = -1; (position = data.IndexOf("=", position + 1)) != -1;) { string leftpart = data.Substring(0, position); System.Collections.ArrayList hex = new System.Collections.ArrayList(); hex.Add(data.Substring(1 + position, 2)); while (position + 31
The only one that worked for me.
唯一对我有用的。
http://sourceforge.net/apps/trac/syncmldotnet/wiki/Quoted%20Printable
http://sourceforge.net/apps/trac/syncmldotnet/wiki/Quoted%20Printable
If you just need to decode the QPs, pull inside of your code those three functions from the link above:
如果您只需要解码QPs,请从上面的链接中提取这三个函数:
HexDecoderEvaluator(Match m) HexDecoder(string line) Decode(string encodedText)And then just:
然后就:
var humanReadable = Decode(myQPString);Enjoy
享受
#8
1
I was looking for a dynamic solution and spent 2 days trying different solutions. This solution will support Japanese characters and other standard character sets
我在寻找一个动态解决方案,花了两天时间尝试不同的解决方案。该解决方案将支持日文字符和其他标准字符集
private static string Decode(string input, string bodycharset) { var i = 0; var output = new List(); while (i然后你可以用
Decode("=E3=82=AB=E3=82=B9=E3", "utf-8")This was originally found here
这是在这里发现的
#9
1
Better solution
更好的解决方案
private static string DecodeQuotedPrintables(string input, string charSet) { try { enc = Encoding.GetEncoding(CharSet); } catch { enc = new UTF8Encoding(); } var occurences = new Regex(@"(=[0-9A-Z]{2}){1,}", RegexOptions.Multiline); var matches = occurences.Matches(input); foreach (Match match in matches) { try { byte[] b = new byte[match.Groups[0].Value.Length / 3]; for (int i = 0; i0
public static string DecodeQuotedPrintables(string input, Encoding encoding) { var regex = new Regex(@"\=(?[0-9A-Z]{2})", RegexOptions.Multiline); var matches = regex.Matches(input); var bytes = new byte[matches.Count]; for (var i = 0; i0
Sometimes the string into an EML file is composed by several encoded parts. This is a function to use the Dave's method for these cases:
有时,进入EML文件的字符串由几个编码部分组成。这是一个用Dave的方法来处理这些情况的函数:
public string DecodeQP(string codedstring){ Regex codified; codified=new Regex(@"=\?((?!\?=).)*\?=", RegexOptions.IgnoreCase); MatchCollection setMatches = codified.Matches(cadena); if(setMatches.Count > 0) { Attachment attdecode; codedstring= ""; foreach (Match match in setMatches) { attdecode = Attachment.CreateAttachmentFromString("", match.Value); codedstring+= attdecode.Name; } } return codedstring;}#12
0
Please note: solutions with "input.Replace" are all over Internet and still they are not correct.
请注意:“输入”的解决方案。替换“都是在互联网上,但仍然是不正确的。”
See, if you have ONE decoded symbol and then use "replace", ALL symbols in "input" will be replaced, and then all following decoding will be broken.
看,如果你有一个解码过的符号,然后使用“替换”,那么“输入”中的所有符号都会被替换,之后所有的解码都会被破坏。
More correct solution:
更正确的解决方案:
public static string DecodeQuotedPrintable(string input, string charSet) { Encoding enc; try { enc = Encoding.GetEncoding(charSet); } catch { enc = new UTF8Encoding(); } input = input.Replace("=\r\n=", "="); input = input.Replace("=\r\n ", "\r\n "); input = input.Replace("= \r\n", " \r\n"); var occurences = new Regex(@"(=[0-9A-Z]{2})", RegexOptions.Multiline); //{1,} var matches = occurences.Matches(input); foreach (Match match in matches) { try { byte[] b = new byte[match.Groups[0].Value.Length / 3]; for (int i = 0; i