PHP 字符实体转换及使用注意事项

正文
参考资料

正文

为了数据安全，有时我们要对用户输入的字符进行转义，但用什么函数呢？

下面是比较：

htmlspecialchars(string,flags,character-set,double_encode);
// htmlspecialchars() 把预定义的字符转换为 HTML 实体
// 预定义的字符：& 、 " 、 ' 、 < 、 >
htmlspecialchars_decode(string,flags);
// htmlspecialchars_decode()htmlspecialchars()的反函数

htmlentities(string,flags,character-set,double_encode);
// htmlentities() 把字符转换为 HTML 实体，注意编码，否则中文会乱码
// htmlspecialchars()的预定义字符包括在内，但一般常用htmlspecialchars()
html_entity_decode(string,flags,character-set);
// html_entity_decode()是htmlentities()的反函数

// 文档说明：
// html_entity_decode — Convert all HTML entities to their applicable characters
// htmlentities — Convert all applicable characters to HTML entities
// htmlspecialchars_decode — Convert special HTML entities back to characters
// htmlspecialchars - Convert special characters to HTML entities 

htmlspecialchars() 执行转换:

字符     替换后
& (& 符号)     &amp;
" (双引号)     &quot;，除非设置了 ENT_NOQUOTES
' (单引号)     设置了 ENT_QUOTES 后， &#039; (如果是 ENT_HTML401) ，或者 &apos; (如果是 ENT_XML1、 ENT_XHTML 或 ENT_HTML5)。
< (小于)     &lt;
> (大于)     &gt;

htmlentities() 各方面都和 htmlspecialchars() 一样，除了以上， htmlentities() 会转换所有具有 HTML 实体的字符。

strip_tags() 剥去字符串中的 HTML、XML 以及 PHP 的标签

HTML 字符实体

显示结果   描述   实体名称   实体编号
      空格     &nbsp;     &#160;
<     小于号     &lt;     &#60;
>     大于号     &gt;     &#62;
&     和号     &amp;     &#38;
"     引号     &quot;     &#34;
'     撇号      &apos; (IE不支持)     &#39;
￠     分     &cent;     &#162;
£     镑     &pound;     &#163;
¥     人民币/日元     &yen;     &#165;
€     欧元     &euro;     &#8364;
§     小节     &sect;     &#167;
©     版权     &copy;     &#169;
®     注册商标     &reg;     &#174;
™     商标     &trade;     &#8482;
×     乘号     &times;     &#215;
÷     除号     &divide;     &#247;

HTML 实体参考手册

现代的浏览器支持的字符集：

ASCII 字符集
标准 ISO 字符集
数学符号、希腊字母、其他符号

ISO-8859-1

ISO-8859-1 是大多数浏览器默认的字符集。

ISO-8859-1 的较低部分（从 1 到 127 之间的代码）是最初的 ASCII 字符集（0-9 的数字，大写和小写英文字母表，以及一些特殊字符）。

ISO-8859-1 的较高部分（从 160 到 255 之间的代码）包含了一些西欧国家使用的字符和一些被广泛使用的特殊字符，它们全都有实体名称。

这些符号中的大多数都可以在不进行实体引用的情况下使用，但是实体名称或实体编号为那些不容易通过键盘键入的符号提供了表达的方法。

HTML 预留字符

HTML and XHTML 预留了一些字符。比如，您不能使用包含这些字符的文本，因为浏览器可能会误以为是 HTML 标签。

HTML and XHTML 中央处理器必须识别以下表格所列举的五种特殊字符：

字符     实体编号     实体名称     描述
      &#160;     &nbsp;     非间断空格（non-breaking space）
¡     &#161;     &iexcl;     倒置感叹号（inverted exclamation mark）
¢     &#162;     &cent;     美分符号（cent）
£     &#163;     &pound;     英镑符号（pound）
¤     &#164;     &curren;     货币符号（currency）
¥     &#165;     &yen;     人民币/日元符号（yen）
¦     &#166;     &brvbar;     间断的竖杠（broken vertical bar）
§     &#167;     &sect;     小节号（section）
¨     &#168;     &uml;     分音符号（spacing diaeresis）
©     &#169;     &copy;     版权所有（copyright）
ª     &#170;     &ordf;     阴性序数记号（feminine ordinal indicator）
«     &#171;     &laquo;     左双角引号（angle quotation mark (left)）
¬     &#172;     &not;     否定符号（negation）
­     &#173;     &shy;     软连字符（soft hyphen）
®     &#174;     &reg;     注册商标（registered trademark)
¯     &#175;     &macr;     长音符号（spacing macron）
°     &#176;     &deg;     度符号（degree）
±     &#177;     &plusmn;     加减号/正负号（plus-or-minus）
²     &#178;     &sup2;     上标 2（superscript 2）
³     &#179;     &sup3;     上标 3（superscript 3）
´     &#180;     &acute;     尖音符号（spacing acute）
µ     &#181;     &micro;     微米符号（micro）
¶     &#182;     &para;     段落符号（paragraph）
·     &#183;     &middot;     中间点（middle dot）
¸     &#184;     &cedil;     变音符号（spacing cedilla）
¹     &#185;     &sup1;     上标 1（superscript 1）
º     &#186;     &ordm;     阳性序数记号（masculine ordinal indicator）
»     &#187;     &raquo;     右双角引号（angle quotation mark (right)）
¼     &#188;     &frac14;     1/4 分数（fraction 1/4）
½     &#189;     &frac12;     1/2 分数（fraction 1/2）
¾     &#190;     &frac34;     3/4 分数（fraction 3/4）
¿     &#191;     &iquest;     倒置问号（inverted question mark）

ISO 8859-1 字符实体

字符     实体编号     实体名称     描述
À     &#192;     &Agrave;     大写字母 A，重音（grave accent）
Á     &#193;     &Aacute;     大写字母 A，尖音（acute accent）
Â     &#194;     &Acirc;     大写字母 A，抑扬音（circumflex accent）
Ã     &#195;     &Atilde;     大写字母 A，腭化（tilde）
Ä     &#196;     &Auml;     大写字母 A，带有变音符号标记（umlaut mark）
Å     &#197;     &Aring;     大写字母 A，带有上圆圈（ring）
Æ     &#198;     &AElig;     大写字母 AE
Ç     &#199;     &Ccedil;     大写字母 C，变音（cedilla）
È     &#200;     &Egrave;     大写字母 E，重音（grave accent）
É     &#201;     &Eacute;     大写字母 E，尖音（acute accent）
Ê     &#202;     &Ecirc;     大写字母 E，抑扬音（circumflex accent）
Ë     &#203;     &Euml;     大写字母 E，带有变音符号标记（umlaut mark）
Ì     &#204;     &Igrave;     大写字母 I，重音（grave accent）
Í     &#205;     &Iacute;     大写字母 I，尖音（acute accent）
Î     &#206;     &Icirc;     大写字母 I，抑扬音（circumflex accent）
Ï     &#207;     &Iuml;     大写字母 I，带有变音符号标记（umlaut mark）
Ð     &#208;     &ETH;     冰岛语大写字母 eth
Ñ     &#209;     &Ntilde;     大写字母 N，腭化（tilde）
Ò     &#210;     &Ograve;     大写字母 O，重音（grave accent）
Ó     &#211;     &Oacute;     大写字母 O，尖音（acute accent）
Ô     &#212;     &Ocirc;     大写字母 O，抑扬音（circumflex accent）
Õ     &#213;     &Otilde;     大写字母 O，腭化（tilde）
Ö     &#214;     &Ouml;     大写字母 O，带有变音符号标记（umlaut mark）
×     &#215;     &times;     乘号（multiplication）
Ø     &#216;     &Oslash;     大写字母 O，带有斜线（slash）
Ù     &#217;     &Ugrave;     大写字母 U，重音（grave accent）
Ú     &#218;     &Uacute;     大写字母 U，尖音（acute accent）
Û     &#219;     &Ucirc;     大写字母 U，抑扬音（circumflex accent）
Ü     &#220;     &Uuml;     大写字母 U，带有变音符号标记（umlaut mark）
Ý     &#221;     &Yacute;     大写字母 Y，尖音（acute accent）
Þ     &#222;     &THORN;     冰岛语大写字母 THORN
ß     &#223;     &szlig;     德语小写字母 sharp s
à     &#224;     &agrave;     小写字母 a，重音（grave accent）
á     &#225;     &aacute;     小写字母 a，尖音（acute accent）
â     &#226;     &acirc;     小写字母 a，抑扬音（circumflex accent）
ã     &#227;     &atilde;     小写字母 a，腭化（tilde）
ä     &#228;     &auml;     小写字母 a，带有变音符号标记（umlaut mark）
å     &#229;     &aring;     小写字母 a，带有上圆圈（ring）
æ     &#230;     &aelig;     小写字母 ae
ç     &#231;     &ccedil;     小写字母 c，变音（cedilla）
è     &#232;     &egrave;     小写字母 e，重音（grave accent）
é     &#233;     &eacute;     小写字母 e，尖音（acute accent）
ê     &#234;     &ecirc;     小写字母 e，抑扬音（circumflex accent）
ë     &#235;     &euml;     小写字母 e，带有变音符号标记（umlaut mark）
ì     &#236;     &igrave;     小写字母 i，重音（grave accent）
í     &#237;     &iacute;     小写字母 i，尖音（acute accent）
î     &#238;     &icirc;     小写字母 i，抑扬音（circumflex accent）
ï     &#239;     &iuml;     小写字母 i，带有变音符号标记（umlaut mark）
ð     &#240;     &eth;     冰岛语小写字母 eth
ñ     &#241;     &ntilde;     小写字母 n，腭化（tilde）
ò     &#242;     &ograve;     小写字母 o，重音（grave accent）
ó     &#243;     &oacute;     小写字母 o，尖音（acute accent）
ô     &#244;     &ocirc;     小写字母 o，抑扬音（circumflex accent）
õ     &#245;     &otilde;     小写字母 o，腭化（tilde）
ö     &#246;     &ouml;     小写字母 o，带有变音符号标记（umlaut mark）
÷     &#247;     &divide;     除号（division）
ø     &#248;     &oslash;     小写字母 o，带有斜线（slash）
ù     &#249;     &ugrave;     小写字母 u，重音（grave accent）
ú     &#250;     &uacute;     小写字母 u，尖音（acute accent）
û     &#251;     &ucirc;     小写字母 u，抑扬音（circumflex accent）
ü     &#252;     &uuml;     小写字母 u，带有变音符号标记（umlaut mark）
ý     &#253;     &yacute;     小写字母 y，尖音（acute accent）
þ     &#254;     &thorn;     冰岛语小写字母 thorn
ÿ     &#255;     &yuml;     小写字母 y，带有变音符号标记（umlaut mark）

函数原型

htmlspecialchars:

/**
 * Convert special characters to HTML entities
 * @link https://php.net/manual/en/function.htmlspecialchars.php
 * @param string $string <p>
 * The {@link https://secure.php.net/manual/en/language.types.string.php string} being converted.
 * </p>
 * @param int $flags [optional] <p>
 * A bitmask of one or more of the following flags, which specify how to handle quotes,
 * invalid code unit sequences and the used document type. The default is
 * <em><b>ENT_COMPAT | ENT_HTML401</b></em>.
 * </p><table>
 * <caption><b>Available <em>flags</em> constants</b></caption>
 * <thead>
 * <tr>
 * <th>Constant Name</th>
 * <th>Description</th>
 * </tr>
 *
 * </thead>
 *
 * <tbody>
 * <tr>
 * <td><b>ENT_COMPAT</b></td>
 * <td>Will convert double-quotes and leave single-quotes alone.</td>
 * </tr>
 *
 * <tr>
 * <td><b>ENT_QUOTES</b></td>
 * <td>Will convert both double and single quotes.</td>
 *</tr>
 *
 * <tr>
 * <td><b>ENT_NOQUOTES</b></td>
 * <td>Will leave both double and single quotes unconverted.</td>
 * </tr>
 *
 * <tr>
 * <td><b>ENT_IGNORE</b></td>
 * <td>
 * Silently discard invalid code unit sequences instead of returning
 * an empty string. Using this flag is discouraged as it
 * {@link https://unicode.org/reports/tr36/#Deletion_of_Noncharacters »&nbsp;may have security implications}.
 * </td>
 * </tr>
 *
 * <tr>
 * <td><b>ENT_SUBSTITUTE</b></td>
 * <td>
 * Replace invalid code unit sequences with a Unicode Replacement Character
 * U+FFFD (UTF-8) or &amp;#FFFD; (otherwise) instead of returning an empty string.
 * </td>
 * </tr>
 *
 * <tr>
 * <td><b>ENT_DISALLOWED</b></td>
 * <td>
 * Replace invalid code points for the given document type with a
 * Unicode Replacement Character U+FFFD (UTF-8) or &amp;#FFFD;
 * (otherwise) instead of leaving them as is. This may be useful, for
 * instance, to ensure the well-formedness of XML documents with
 * embedded external content.
 * </td>
 * </tr>
 *
 * <tr>
 * <td><b>ENT_HTML401</b></td>
 * <td>
 * Handle code as HTML 4.01.
 * </td>
 * </tr>
 *
 * <tr>
 * <td><b>ENT_XML1</b></td>
 * <td>
 * Handle code as XML 1.
 * </td>
 * </tr>
 *
 * <tr>
 * <td><b>ENT_XHTML</b></td>
 * <td>
 * Handle code as XHTML.
 * </td>
 * </tr>
 *
 * <tr>
 * <td><b>ENT_HTML5</b></td>
 * <td>
 * Handle code as HTML 5.
 * </td>
 * </tr>
 *
 * </tbody>
 *
 * </table>
 * @param string $encoding [optional] <p>
 * Defines encoding used in conversion.
 * If omitted, the default value for this argument is ISO-8859-1 in
 * versions of PHP prior to 5.4.0, and UTF-8 from PHP 5.4.0 onwards.
 * </p>
 * <p>
 * For the purposes of this function, the encodings
 * <em>ISO-8859-1</em>, <em>ISO-8859-15</em>,
 * <em>UTF-8</em>, <em>cp866</em>,
 * <em>cp1251</em>, <em>cp1252</em>, and
 * <em>KOI8-R</em> are effectively equivalent, provided the
 * <em><b>string</b></em> itself is valid for the encoding, as
 * the characters affected by  <b>htmlspecialchars()</b> occupy
 * the same positions in all of these encodings.
 * </p>
 * @param bool $double_encode [optional] <p>
 * When <em><b>double_encode</b></em> is turned off PHP will not
 * encode existing html entities, the default is to convert everything.
 * </p>
 * @return string The converted string.
 */
function htmlspecialchars ($string, $flags = ENT_COMPAT | ENT_HTML401, $encoding = 'UTF-8', $double_encode = true) {}

/**
 * Convert special HTML entities back to characters
 * @link https://php.net/manual/en/function.htmlspecialchars-decode.php
 * @param string $string <p>
 * The string to decode
 * </p>
 * @param int $quote_style [optional] <p>
 * The quote style. One of the following constants:
 * <table>
 * quote_style constants
 * <tr valign="top">
 * <td>Constant Name</td>
 * <td>Description</td>
 * </tr>
 * <tr valign="top">
 * <td>ENT_COMPAT</td>
 * <td>Will convert double-quotes and leave single-quotes alone
 * (default)</td>
 * </tr>
 * <tr valign="top">
 * <td>ENT_QUOTES</td>
 * <td>Will convert both double and single quotes</td>
 * </tr>
 * <tr valign="top">
 * <td>ENT_NOQUOTES</td>
 * <td>Will leave both double and single quotes unconverted</td>
 * </tr>
 * </table>
 * </p>
 * @return string the decoded string.
 */
function htmlspecialchars_decode ($string, $quote_style = null) {}

htmlentities:

/**
 * Convert all applicable characters to HTML entities
 * @link https://php.net/manual/en/function.htmlentities.php
 * @param string $string <p>
 * The input string.
 * </p>
 * @param int $quote_style [optional] <p>
 * Like htmlspecialchars, the optional second
 * quote_style parameter lets you define what will
 * be done with 'single' and "double" quotes. It takes on one of three
 * constants with the default being ENT_COMPAT:
 * <table>
 * Available quote_style constants
 * <tr valign="top">
 * <td>Constant Name</td>
 * <td>Description</td>
 * </tr>
 * <tr valign="top">
 * <td>ENT_COMPAT</td>
 * <td>Will convert double-quotes and leave single-quotes alone.</td>
 * </tr>
 * <tr valign="top">
 * <td>ENT_QUOTES</td>
 * <td>Will convert both double and single quotes.</td>
 * </tr>
 * <tr valign="top">
 * <td>ENT_NOQUOTES</td>
 * <td>Will leave both double and single quotes unconverted.</td>
 * </tr>
 * </table>
 * </p>
 * @param string $charset [optional] <p>
 * Like htmlspecialchars, it takes an optional
 * third argument charset which defines character
 * set used in conversion.
 * Presently, the ISO-8859-1 character set is used as the default.
 * </p>
 * &reference.strings.charsets;
 * @param bool $double_encode [optional] <p>
 * When double_encode is turned off PHP will not
 * encode existing html entities. The default is to convert everything.
 * </p>
 * @return string the encoded string.
 */
function htmlentities ($string, $quote_style = null, $charset = null, $double_encode = true) {}

/**
 * Convert all HTML entities to their applicable characters
 * @link https://php.net/manual/en/function.html-entity-decode.php
 * @param string $string <p>
 * The input string.
 * </p>
 * @param int $quote_style [optional] <p>
 * The optional second quote_style parameter lets
 * you define what will be done with 'single' and "double" quotes. It takes
 * on one of three constants with the default being
 * ENT_COMPAT:
 * <table>
 * Available quote_style constants
 * <tr valign="top">
 * <td>Constant Name</td>
 * <td>Description</td>
 * </tr>
 * <tr valign="top">
 * <td>ENT_COMPAT</td>
 * <td>Will convert double-quotes and leave single-quotes alone.</td>
 * </tr>
 * <tr valign="top">
 * <td>ENT_QUOTES</td>
 * <td>Will convert both double and single quotes.</td>
 * </tr>
 * <tr valign="top">
 * <td>ENT_NOQUOTES</td>
 * <td>Will leave both double and single quotes unconverted.</td>
 * </tr>
 * </table>
 * </p>
 * @param string $charset [optional] <p>
 * The ISO-8859-1 character set is used as default for the optional third
 * charset. This defines the character set used in
 * conversion.
 * </p>
 * &reference.strings.charsets;
 * @return string the decoded string.
 */
function html_entity_decode ($string, $quote_style = null, $charset = null) {}

参考资料

HTML 字符实体 https://www.runoob.com/html/html-entities.html

HTML 实体参考手册 https://www.runoob.com/tags/ref-entities.html

htmlspecialchars https://www.php.net/htmlspecialchars/

htmlentities https://www.php.net/manual/zh/function.htmlentities.php

PHP strip_tags() 函数 https://www.runoob.com/php/func-string-strip-tags.html

htmlspecialchars_decode 与 html_entity_decode https://blog.csdn.net/gaoce227/article/details/72785221

http://blog.csdn.net/agangdi/article/details/8217045

https://www.cnblogs.com/A-Song/archive/2011/12/20/2294599.html

http://www.w3school.com.cn/php/func_string_htmlspecialchars.asp

http://blog.csdn.net/zhuizhuziwo/article/details/50623014

http://www.w3school.com.cn/php/func_string_htmlentities.asp

http://www.bubuko.com/infodetail-2100074.html

http://blog.csdn.net/xinyflove/article/details/51719538

http://blog.csdn.net/xiaoxiaoniaoer1/article/details/7717669

http://www.cnblogs.com/suihui/archive/2012/09/20/2694751.html