UUID generator in PL/pgSQL
While writing an SQL script to upgrade the MusicBrainz database for the last release, I needed a way to generate new UUIDs from SQL. PostgreSQL has a native UUID data type and a contrib module for generating UUIDs since version 8.3, but this wouldn’t help me, because I needed it to work with at least version 8.1. I had this idea to write PL/pgSQL functions to generate UUIDs, so I skimmer over the RFC 4122 that documents them and found out that it isn’t actually that hard.
MusicBrainz uses random-based UUIDs (version 4) for all it’s new IDs, so the first idea was to implement the same. I know I can’t use this code in the end, because I need a good pseudo-random number generator, but I couldn’t resist to write it anyway. Messing with bits in high-level languages is always fun :) Here is the result (because of the use of the random()
function, don’t use the code for anything serious):
CREATE OR REPLACE FUNCTION generate_uuid_v4() RETURNS uuid
AS $$
DECLARE
value VARCHAR(36);
BEGIN
value = lpad(to_hex(ceil(random() * 255)::int), 2, '0');
value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
value = value || '-';
value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
value = value || '-';
value = value || lpad((to_hex((ceil(random() * 255)::int & 15) | 64)), 2, '0');
value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
value = value || '-';
value = value || lpad((to_hex((ceil(random() * 255)::int & 63) | 128)), 2, '0');
value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
value = value || '-';
value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
value = value || lpad(to_hex(ceil(random() * 255)::int), 2, '0');
RETURN value::uuid;
END;
$$ LANGUAGE 'plpgsql';
It turned out that we need deterministic IDs to be generated from the script, so V4 was out of question. That was good, because we would need a better PNRG for the final version.
The next idea was to create the URL on which the new rows will be server and generate name-based UUIDs using the URL namespace. The idea is to concatenate a namespace and a name, calculate a cryptographic hash of the result, and use it’s bits to generate the UUID. There are two options for hashing, either MD5 (version 3) or SHA-1 (version 5). SHA-1 is preferred by the RFC, but PostgreSQL only has a built-in function for MD5, so the decision for us was easy. The code doesn’t depend any random numbers, so it’s good enough to use in production.
CREATE OR REPLACE FUNCTION from_hex(t text) RETURNS integer
AS $$
DECLARE
r RECORD;
BEGIN
FOR r IN EXECUTE 'SELECT x'''||t||'''::integer AS hex' LOOP
RETURN r.hex;
END LOOP;
END
$$ LANGUAGE plpgsql IMMUTABLE STRICT;
CREATE OR REPLACE FUNCTION generate_uuid_v3(namespace varchar, name varchar) RETURNS uuid
AS $$
DECLARE
value varchar(36);
bytes varchar;
BEGIN
bytes = md5(decode(namespace, 'hex') || decode(name, 'escape'));
value = substr(bytes, 1+0, 8);
value = value || '-';
value = value || substr(bytes, 1+2*4, 4);
value = value || '-';
value = value || lpad(to_hex((from_hex(substr(bytes, 1+2*6, 2)) & 15) | 48), 2, '0');
value = value || substr(bytes, 1+2*7, 2);
value = value || '-';
value = value || lpad(to_hex((from_hex(substr(bytes, 1+2*8, 2)) & 63) | 128), 2, '0');
value = value || substr(bytes, 1+2*9, 2);
value = value || '-';
value = value || substr(bytes, 1+2*10, 12);
return value::uuid;
END;
$$ LANGUAGE 'plpgsql' IMMUTABLE STRICT;
This code should be easy enough to modify to generate UUIDv5, if you have a way to calculate SHA-1 hashes. To use the function, you need to pass it a namespace and a name. The namespace itself is a UUID, it can be anything, but there are a few well-known options:
- URL
'6ba7b8119dad11d180b400c04fd430c8'
- DNS (fully-qualified domain name)
'6ba7b8109dad11d180b400c04fd430c8'
- ISO OID
'6ba7b8129dad11d180b400c04fd430c8'
- X.500 DN (in DER or a text output format)
'6ba7b814-9dad-11d1-80b4-00c04fd430c8'
The URL one is probably the most useful. So, to generate UUIDv3 for http://www.example.com/foo/1
, you can use the following:
SELECT generate_uuid_v3('6ba7b8119dad11d180b400c04fd430c8', 'http://www.example.com/foo/1');