If you build or maintain an online store, URL encoding is one of those topics that seems trivial until it breaks something in production. A category filter returns zero products, a payment callback fails signature validation, or a tracking parameter silently loses its value. In most of these cases, the root cause is a string that was encoded twice, decoded once too often, or never encoded at all.
This guide walks through how URL encoding actually works, where it bites ecommerce systems in particular, and the habits that keep your URLs clean and your integrations stable.
A URL may only contain a limited set of characters: letters, digits, and a handful of symbols like hyphens, underscores, dots, and tildes. Everything else, spaces, ampersands, question marks, and non-Latin characters, must be represented as a percent sign followed by two hexadecimal digits. A space becomes %20, an ampersand becomes %26, and the German letter "ü" becomes %C3%BC in UTF-8.
The reason is simple: characters like? and & have structural meaning in a URL. The question mark starts the query string, the ampersand separates parameters, and the equals sign binds a parameter to its value. If a customer searches for "black & white sneakers" and your store passes that string into a query parameter without encoding, the server sees an extra parameter boundary where none was intended. The search term arrives truncated, and the customer sees wrong or empty results.
Product and category URLs. Stores with international catalogs often carry accented characters or non-Latin scripts in product names. If your URL keys are generated from those names without normalization, you end up with percent-encoded slugs that look fine in the browser bar but become fragile the moment another system, a feed exporter, a CDN rule, a redirect map, handles them with different encoding assumptions. The safer pattern is to transliterate slugs to plain ASCII at creation time and treat the encoded form purely as a transport detail.
Layered navigation and filters. Filter URLs concatenate many parameters: color, size, price ranges, and sort order. Price ranges often contain commas or currency symbols, and multi-select filters chain values with separators. Every one of those values must be encoded individually before the query string is assembled, not afterward. Encoding the finished query string is the classic double-encoding mistake: %20 turns into %2520, and the decoding side now sees literal percent signs.
Payment gateway callbacks. This is the expensive one. Most payment Providers sign their callback payloads, and the signature is computed over an exact byte sequence. If your callback handler decodes parameters a second time, or your framework normalizes plus signs and percent sequences differently than the gateway expects, the computed signature no longer matches, and valid payments get rejected. When debugging signature mismatches, compare the raw query string to what the server received, which is what the gateway documentation says it sends, before any framework magic touches it.
Tracking and campaign parameters. Marketing URLs routinely embed full destination URLs inside parameters, a redirect target, a deep link, or a return URL. A URL inside a URL must be fully encoded, including its own scheme and slashes. If it is not, everything after the first unencoded ampersand belongs to the outer URL, and your campaign attribution quietly breaks.
Modern languages give you correct primitives. In JavaScript, encodeURIComponent encodes a single value, while encodeURI is meant for complete URLs and deliberately leaves structural characters alone.
Mixing the two up is the most common JavaScript URL bug. In PHP, rawurlencode follows RFC 3986 and encodes spaces as %20, while the older urlencode produces plus signs for spaces, which only form-encoded bodies expect. Pick the RFC 3986 variant for anything that ends up in a URL path or query.
Two rules prevent most incidents. First, encode at the last moment, when the value is placed into the URL, and decode at the first moment, when the value is read out. Everything in between should work with the raw, unencoded value. Second, never encode or decode the same string twice. If you cannot tell whether a value is already encoded, that is an architecture smell: the boundary between raw and encoded data is unclear somewhere upstream.
When something does look off, a quick round-trip through an online encoder and decoder is often the fastest way to see what a string really contains, especially with mixed UTF-8 sequences, where two visually identical strings differ in their byte representation.
Encoding bugs are easiest to diagnose when the rest of the stack behaves predictably. Reverse proxies, CDNs, and web application firewalls all touch URLs: they normalize percent sequences, enforce length limits, and sometimes block requests with suspicious-looking encoded payloads. A redirect rule that works on a developer machine. It can behave differently behind a CDN that decodes before matching.
For stores running Magento, where layered navigation, multi-store URLs, and checkout callbacks all stress URL handling at scale, a provider that manages this layer carefully matters; managed platforms like MGT Commerce hosting tune the full Nginx and Varnish chain so that the URL normalization stays consistent from edge to application, which removes a whole class of hard-to-reproduce encoding issues.
Generate ASCII slugs at creation time instead of encoding Unicode slugs on the fly. Encode every parameter value individually with an RFC 3986-compliant function. Never encode a finished query string a second time. Treat URLs embedded in parameters as values and encode them completely. When validating payment callbacks, work on the raw received bytes. And when in doubt, decode the string step by step until it stops changing: the number of steps tells you how many times it was encoded.
URL encoding is not glamorous, but it sits underneath search, navigation, payments, and tracking, the four functions of an online store cannot afford to get it wrong. Treat encoded strings as a transport format, keep the raw-versus-encoded boundary explicit, and most of these bugs never reach production.