Web Development

URL Encoding Explained: How Percent Encoding Works

The Debuggers
5 min read

When you see a string of characters like %20 or %26 in a web address, that is URL encoding at work. It is a fundamental part of the technical infrastructure of the internet. It ensures that data travels safely between your browser and the web server without being corrupted or misinterpreted.

Web browsers and servers rely on strict standards to pass information back and forth. If you want a deep dive into reading decoded strings, try our free URL Encoder and Decoder tool to instantly convert text.

What Is URL Encoding?

URLs can only be sent over the internet using a limited subset of the ASCII character set. This limitation exists because some characters have special meanings within the context of a URL. For example, the colon (:) separates the protocol from the rest of the address, and the forward slash (/) separates directory paths.

If you tried to include one of these special characters as part of your actual data, the web server would get confused. To solve this, any character outside of a safe "unreserved" list must be converted into a valid format before the browser sends the request.

This conversion process is officially called percent-encoding.

The History and Standards of URL Encoding

The standard for URL encoding has evolved over several decades. The most significant and current standard is defined in RFC 3986. This document outlines exactly which characters are safe to use and how the encoding process should handle everything else.

In the earliest days of the web, encoding rules were somewhat inconsistent between different browsers and operating systems. This led to data loss when users tried to transmit non-English characters or complex symbols. With the adoption of RFC 3986 and the rise of the UTF-8 character set, URL encoding has become extremely robust. It now supports virtually every character from every human language, provided they are correctly encoded using the percent-format.

Detailed Character Sets: Reserved vs Unreserved

To understand when a character needs to be encoded, we must categorize them into two groups.

Unreserved Characters

These are characters that are allowed in a URL without any encoding. They always represent their literal selves. According to modern standards, the unreserved set includes:

  • Uppercase letters (A to Z)
  • Lowercase letters (a to z)
  • Numbers (0 to 9)
  • Hyphens, underscores, periods, and tildes (-, _, ., ~)

Reserved Characters

These characters have a specific job inside a URL. If you want to use them for their literal value instead of their structural job, you must encode them.

  • Structural characters: :, /, ?, #, [, ], @
  • Sub-delimiters: !, $, &, ', (, ), *, +, ,, ;, =

When an unsafe or reserved character appears in a URL, it is replaced by a percent sign (%) followed by two hexadecimal digits. These digits represent the ASCII value of the character. For instance, a space becomes %20 because 20 is the hexadecimal representation of the ASCII code 32.

Common URL Encoded Characters

You encounter these all the time when clicking links, submitting search queries, or browsing e-commerce sites.

CharacterPercent-Encoded ValueCommon Use Case
Space ( )%20Separating words in query strings or file names.
Ampersand (&)%26Safe transmission of data containing literal ampersands.
Equals (=)%3DSafe transmission of data containing equals signs.
Plus (+)%2BOften used as a historical alternative to %20 for spaces.
Question Mark (?)%3FWhen the literal question mark is part of the data.
Hash (#)%23Used when an actual hash character is part of a password or query.

The Plus Sign (+) versus %20 for Spaces

One of the most frequent sources of confusion in web development is whether to use a plus sign or %20 to represent a space.

The plus sign is a historical artifact from the early days of HTML form submissions. When you submit a form with a method="GET", the browser traditionally encodes spaces in the query string as plus signs. However, the modern RFC 3986 standard specifies that spaces should be encoded as %20.

The rule of thumb for modern developers is clear: use %20 for paths and use either plus or %20 for query strings. If you want to be perfectly compliant with current standards, always opt for %20. Modern decoding functions like our URL Decoder handle both variations automatically to ensure you never lose data.

URL Encoding vs HTML Encoding

It is easy to confuse URL encoding with HTML entity encoding, but they solve entirely different problems.

URL encoding ensures web addresses are structurally valid for HTTP transmission. It target headers and the address bar. HTML encoding, which uses strings like & or <, prevents characters from breaking the structure of an HTML document in the browser window.

If you have a literal ampersand in your data, it should be %26 in the URL. If that same data is then displayed inside a paragraph on your website, it should be converted to & to ensure the browser does not mistake it for the start of another HTML entity.

Internationalization and Non-Latin Characters

As the internet has become a global tool, the need to support non-ASCII characters has grown. Characters from Chinese, Arabic, Russian, and other scripts cannot be represented directly in ASCII.

To handle this, browsers use a two-step process. First, the character is converted into its UTF-8 byte sequence. Then, each byte in that sequence is percent-encoded. For example, a single Chinese character or an emoji might turn into a long string like %E6%AC%A2%E8%BF%8E. This ensures that even though the URL only contains ASCII characters (the percent sign and hex digits), the original meaning is preserved for the server to decode.

encodeURI vs encodeURIComponent in JavaScript

If you are a JavaScript developer, you will frequently use two built-in functions: encodeURI() and encodeURIComponent(). Using the wrong one can break your application or create security vulnerabilities.

Use encodeURI() when you have a complete, working URL and you just need to ensure any rogue spaces or illegal characters are escaped. It explicitly ignores structural characters like slashes, colons, and question marks so the link still functions as a valid address.

Use encodeURIComponent() when you are taking raw user input or dynamic data and injecting it into a query string. This function encodes every single character that is not a letter or number. This ensures that if a user types a character like & or =, it does not accidentally create a new, broken query parameter.

// Using encodeURI for a full URL
const fullUrl = "https://example.com/search files";
console.log(encodeURI(fullUrl)); 
// Output: https://example.com/search%20files

// Using encodeURIComponent for a parameter value
const userQuery = "100% free & open source";
const url = `https://example.com/search?q=${encodeURIComponent(userQuery)}`;
console.log(url);
// Output: https://example.com/search?q=100%25%20free%20%26%20open%20source

URL Encoding in PHP

PHP offers two functions for the exact same purpose: urlencode() and rawurlencode().

The main difference is historical and follows the space encoding rules we discussed earlier. urlencode() encodes spaces as a plus sign (+), following the early application/x-www-form-urlencoded standard. rawurlencode() complies with the modern RFC 3986 standard and encodes spaces as %20.

Modern applications should generally prefer rawurlencode() because it provides more predictable results across different systems and programming languages.

URL Encoding in Python

Python developers typically turn to the urllib.parse module to handle percent-encoding.

The quote() function handles standard strings, similar to encodeURI(), while the urlencode() function can take a Python dictionary and instantly convert it into a perfectly formatted and safely encoded query string.

import urllib.parse

# Convert a dictionary to a safe query string
query_args = {"q": "python tutorial", "lang": "en"}
print(urllib.parse.urlencode(query_args))
# Output: q=python+tutorial&lang=en

Security Implications: URL Encoding and Attacks

URL encoding is not just about functionality. It is also a critical consideration for security. Attackers often use multiple layers of encoding to hide malicious scripts.

One common technique is called Double Encoding. An attacker might encode an unsafe character twice (like %253C for <) to bypass simple security filters that only look for a single level of encoding. If the server then decodes the data twice without validation, the malicious script could be executed.

Always ensure your application decodes data once and validates it thoroughly before displaying it or using it in a database query. Our URL Parser can help you inspect the raw and decoded versions of suspicious links to understand exactly what parameters are being passed.

Common URL Encoding Mistakes to Avoid

  1. Not encoding query parameter values. If a user types an ampersand into a search box and your script appends it directly to the URL without encoding, the server will interpret it as a new parameter field.
  2. Encoding the entire URL with encodeURIComponent. If you do this, your slashes and colons will be turned into %2F and %3A, which makes the URL unclickable and unresolvable by the browser.
  3. Using plus signs for path spaces. While a plus sign is historically acceptable for spaces in a query string, it is strictly forbidden for representing spaces in a URL path slice. Paths always require the %20 format.

How to Decode a URL

Decoding is the process of taking a percent-encoded string and turning it back into its original form.

If you need to instantly turn a massive block of encoded text into a readable guide, paste it into our URL Encoder and Decoder. This tool supports both modern and historical encoding styles.

If you are writing JavaScript, simply wrap your string in the native decodeURIComponent() function to get the readable result.

const readableStr = decodeURIComponent("hello%20world%21");
console.log(readableStr); // Output: hello world!

Understanding how web addresses are structured is vital for modern web development and technical SEO. If you want to optimize your web addresses for search engines, try our URL Slug Generator to automatically format your titles. If you need to break down a giant tracking link into its structural components, drop it into the URL Parser for a full visual breakdown and validation report.

Need Help Implementing This in a Real Project?

Our team supports end-to-end development for web and mobile software, from architecture to launch.

Found this helpful?

Join thousands of developers using our tools to write better code, faster.