- Toby Inkster
- domains; php; programming; oop
On Usenet an often-asked question is how to programmatically determine the "domain" of a particular hostname. That is, excluding the components traditionally thought of as subdomains. As an example, groups.google.com and www.google.com both have a domain of google.com.
Invariably, one answer comes back stating that you just need to chop off everything from the front, leaving only the last two components. But then someone will chime in pointing out that groups.google.co.uk would be left as just co.uk that way, when what is really wanted is google.co.uk. And the eventual resolution of the argument will be "it just can't be done".
The problem is that there's technically no difference between a domain and a subdomain: it's simply a matter of convention. Fortunately, this issue is actually quite important to browser programmers, as it's a key issue in cookie security: browsers must allow subdomains within a domain to share cookie data, but not allow cookies to be passed from one domain to another. And so, the Mozilla project has created the Public Suffix List, a codified list of convention.
The following PHP class can be used to download the latest Public Suffix List and store it in your temp directory, and then find the domain name for a particular host. You may use it as follows:
<?php include "Domain.class.php"; $url = "http://ophelia.goddamn.co.uk/?foo=bar"; $domain = Domain::from_url($url); echo $domain->get_reg_domain(); // goddamn.co.uk. $domain2 = new Domain("british-library.uk"); echo $domain->get_etld(); // uk. 1?>