Computer
07/07/2006 (last updated on 12/18/2006)
Online Tool for URL Encoding/Decoding
This web page allows you to encode/decode a string for a URL according to RFC 3986 and RFC 3629.
Contents |
What Is URL Encoding?
URL encoding stands for encoding certain characters in a URL by replacing them with one or more character triplets that consist of the percent character "%" followed by two hexadecimal digits. The two hexadecimal digits of the triplet(s) represent the numeric value of the replaced character.
The term URL encoding is a bit inexact because the encoding procedure is not limited to URLs (Uniform Resource Locators), but can also be applied to any other URIs (Uniform Resource Identifiers) such as URNs (Uniform Resource Names). Therefore, the term percent-encoding should be preferred.
What Is URL Encoding Good For?
In two cases, the URL encoding mechanism is used to represent information in a component of a URL:
- The character that corresponds with the information to be represented is outside the set of characters allowed to be used in a URL. For example, the space character is outside the allowed set. Therefore, it has to be URL encoded as "
%20". - The character that corresponds with the information to be represented is a reserved character that has special meaning in a certain context. If it is necessary to use that character for some other purpose than the reserved purpose, then the character must be URL encoded. For example, the reserved character "
/" has the reserved purpose of being a delimiter between path segments. Compare the URL for the Wikipedia article "Percent-encoding":http://en.wikipedia.org/wiki/Percent-encoding
But if the character "/" needs to be in a path segment, then it has to be URL encoded. Compare the encoded URL for the fictitious Wikipedia article "URL-/Percent-encoding":http://en.wikipedia.org/wiki/URL-%2FPercent-encoding
Which Characters Are Allowed in a URL?
According to RFC 3986, the characters in a URL have to be taken from a defined set of unreserved and reserved ASCII characters. Any other characters are not allowed in a URL.
The unreserved characters can be encoded, but should not be encoded. The unreserved characters are:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 - _ . ~
The reserved characters have to be encoded only under certain circumstances (see above). The reserved characters are:
! * ' ( ) ; : @ & = + $ , / ? % # [ ]
Encoding/Decoding a Piece of Text
RFC 3986 does not define according to which character encoding table non-ASCII characters (e.g. the umlauts ä, ö, ü) should be encoded. As URL encoding involves a pair of hexadecimal digits and as a pair of hexadecimal digits is equivalent to 8 bits, it would theoretically be possible to use one of the 8-bit code pages for non-ASCII characters (e.g. ISO-8859-1 for umlauts).
On the other hand, as many languages have their own 8-bit code page, handling all these different 8-bit code pages would be a quite cumbersome thing to do. Some languages do not even fit into an 8-bit code page (e.g. Chinese). Therefore, RFC 3629 proposes to use the UTF-8 character encoding table for non-ASCII characters. The following tool takes this into account and offers to choose between the ASCII character encoding table and the UTF-8 character encoding table. If you opt for the ASCII character encoding table, a warning message will pop up if the URL encoded/decoded text contains non-ASCII characters.
Note: This tool is released under the GNU General Public License (GPL) and requires JavaScript to be enabled in your browser.
External Links
-
- More information about percent-encoding (Wikipedia)
- URL encoding with Java (UTF-8 character encoding, source code available)
Comments, criticism, corrections? Found grammar/spelling mistakes on this translated page? Write an email!