Everyone knows that on most systems you can specify IPv4 addresses just 4 decimal numbers separated with periods (dots). Example:
Useful when you for example want to ping your local wifi router and similar. “ping 192.168.0.1”
The IPv4 string is usually parsed by the
inet_addr() function or at times it is passed straight into the name resolver function like
This address parser supports more ways to specify the address. You can for example specify each number using either octal or hexadecimal.
Write the numbers with zero-prefixes to have them interpreted as octal numbers:
Write them with 0x-prefixes to specify them in hexadecimal:
You will find that ping can deal with all of these.
As a 32 bit number
An IPv4 address is a 32 bit number that when written as 4 separate numbers are split in 4 parts with 8 bits represented in each number. Each separate number in “a.b.c.d” is 8 bits that combined make up the whole 32 bits. Sometimes the four parts are called quads.
The typical IPv4 address parser however handles more ways than just the 4-way split. It can also deal with the address when specified as one, two or three numbers (separated with dots unless its just one).
If given as a single number, it treats it as a single unsigned 32 bit number. The top-most eight bits stores what we “normally” with write as the first number and so on. The address shown above, if we keep it as hexadecimal would then become:
And you can of course write it in octal as well:
and plain old decimal:
As two numbers
If you instead write the IP address as two numbers with a dot in between, the first number is assumed to be 8 bits and the next one a 24 bit one. And you can keep on mixing the bases as you see like. The same address again, now in a hexadecimal + octal combo:
This allows for some fun shortcuts when the 24 bit number contains a lot of zeroes. Like you can shorten “
127.0.0.1” to just “
127.1” and it still works and is perfectly legal.
As three numbers
Now the parts are supposed to be split up in bits like this: 8.8.16. Here’s the example address again in octal, hex and decimal:
All of these versions shown above work with most tools that accept IPv4 addresses and sometimes you can bypass filters and protection systems by switching to another format so that you don’t match the filters. It has previously caused problems in node and perl packages and I’m guessing numerous others. It’s a feature that is often forgotten, ignored or just not known.
It begs the question why this very liberal support was once added and allowed but I’ve not been able to figure that out – maybe because of how it matches class A/B/C networks. The support for this syntax seems to have been introduced with the inet_aton() function in the 4.2BSD release in 1983.
IPv4 in URLs
URLs has a host name in them and it can be specified as an IPv4 address.
The RFC 3986 URL specification’s section 3.2.2 says an IPv4 address must be specified as:
dec-octet "." dec-octet "." dec-octet "." dec-octet
… but in reality very few clients that accept such URLs actually restrict the addresses to that format. I believe mostly because many programs will pass on the host name to a name resolving function that itself will handle the other formats.
The WHATWG URL Spec
The Host Parsing section of this spec allows the many variations of IPv4 addresses. (If you’re anything like me, you might need to read that spec section about three times or so before that’s clear).
Since the browsers all obey to this spec there’s no surprise that browsers thus all allow this kind of IP numbers in URLs they handle.
curl has traditionally been in the camp that mostly accidentally somewhat supported the “flexible” IPv4 address formats. It did this because if you built curl to use the system resolver functions (which it does by default) those system functions will handle these formats for curl. If curl was built to use c-ares (which is one of curl’s optional name resolver backends), using such address formats just made the transfer fail.
The drawback with allowing the system resolver functions to deal with the formats is that curl itself then works with the original formatted host name so things like HTTPS server certificate verification and sending
Host: headers in HTTP don’t really work the way you’d want.
Starting in curl 7.77.0 (since this commit ) curl will “natively” understand these IPv4 formats and normalize them itself.
There are several benefits of doing this ourselves:
- Applications using the URL API will get the normalized host name out.
- curl will work the same independently of selected name resolver backend
- HTTPS works fine even when the address is using other formats
- HTTP virtual hosts headers get the “correct” formatted host name
Fun example command line to see if it works:
curl -L 16843009
16843009 gets normalized to 184.108.40.206 which then gets used as
http://220.127.116.11 (because curl will assume HTTP for this URL when no scheme is used) which returns a 301 redirect over to
-L makes curl follow…