▲UUIDv47: Store UUIDv7 in DB, emit UUIDv4 outside (SipHash-masked timestamp)github.com

194 points by aabbdev 143 days ago | 88 comments

aabbdev 143 days ago [-]

Hi, I’m the author of uuidv47. The idea is simple: keep UUIDv7 internally for database indexing and sortability, but emit UUIDv4-looking façades externally so clients don’t see timing patterns.

How it works: the 48-bit timestamp is XOR-masked with a keyed SipHash-2-4 stream derived from the UUID’s random field. The random bits are preserved, the version flips between 7 (inside) and 4 (outside), and the RFC variant is kept. The mapping is injective: (ts, rand) → (encTS, rand). Decode is just encTS ⊕ mask, so round-trip is exact.

Security: SipHash is a PRF, so observing façades doesn’t leak the key. Wrong key = wrong timestamp. Rotation can be done with a key-ID outside the UUID.

Performance: one SipHash over 10 bytes + a couple of 48-bit loads/stores. Nanosecond overhead, header-only C11, no deps, allocation-free.

Tests: SipHash reference vectors, round-trip encode/decode, and version/variant invariants.

Curious to hear feedback!

JimDabell 143 days ago [-]

I like the idea.

UUIDs are often generated client-side. Am I right in thinking that this isn’t possible with this approach? Even if you let clients give you UUIDs and they gave them back the masked versions, wouldn't you be vulnerable to a client providing two UUIDs with different ts and the same rand? So this is only designed for when you are generating the UUIDv7s yourself?

move-on-by 143 days ago [-]

Any version of UUID except v4 on the client side would be a mistake- as you are relying on it to provide extra information such as a timestamp which might be manipulated.

Of course, UUIDv4 on the client side is not without risk either- needing to validate uniqueness and not re-use of some other ID. For the UUIDv7 on client side- you could add some sanity validation- but really I think it’s best avoided.

JimDabell 143 days ago [-]

There’s a whole bunch of use-cases where the ability for a user to mess with the timestamp is not a problem. Who cares if a user screws up the ordering of items in a collection only they see? But if you can attack the private key by generating many different ciphertexts for the same rand, that might let you defeat the purpose of this masking.

move-on-by 142 days ago [-]

I concede my above comment was speaking in generalities and has plenty of exceptions. However, I do prefer to fallback on safe defaults, and letting the client choose the UUIDv7 could certainly have some unsafe consequences.

lazide 141 days ago [-]

What if a broken client implementation uses the same client ‘generated’ UUID (or very similar) for all client requests?

knome 143 days ago [-]

creating your uuids client side has a risk of clients toying with the uuids.

creating them server-side risks having a network error cause a client to have requested a resource be created without receiving its id due to a network error before receiving the response, risking double submissions and generally bad recovery options from the UI.

if you need users to provide uuids for consistent network operations, you can have an endpoint responsible for generating signed uuids that expire after a short interval, thereby controlling uuid-time drift (must be used within 1-5 minutes, perhaps), ensuring the client can't forge them to mess with your backend, and still provide a nice and stable client-side-uuid system.

for the uuidv47 thing, you would apply their XOR trick prior to sending the UUID to the user. you presumably just reverse the XOR trick to get the UUIDv7 back from the UUIDv4 you passed them.

Lvl999Noob 143 days ago [-]

Why not have a transient client generated ID for idempotency but a server generated ID for long term reference and storage?

ycombinatrix 143 days ago [-]

>UUIDs are often generated client-side

since when?

darkr 143 days ago [-]

It’s not uncommon. Google AIP spec requires it for example. I think the main driver for it is implicit idempotency.

eadmund 142 days ago [-]

The client’s ID for a resource and the server’s ID for that resource need not be the same.

Of course, adding two IDs for a resource complicates things. But so too does trusting client-generated IDs to be universally unique.

143 days ago [-]

the_mitsuhiko 143 days ago [-]

Two pieces of feedback here:

1. You implicitly take away someone else's hypothetical benefit of leveraging UUID v7, which is disappointing for any consumer of your API.

2. By storing the UUIDs differently on your API service from internally, you're going to make your life just a tiny bit harder because now you have to go through this indirection of conversion, and I'm not sure if this is worth it.

whatevaa 143 days ago [-]

1. Unless API explicitly guarantees that property, relying on that is bad idea. I wouldn't.

throw0101a 143 days ago [-]

> 1. Unless API explicitly guarantees that property, relying on that is bad idea. I wouldn't.

    With a sufficient number of users of an API,
    it does not matter what you promise in the contract:
    all observable behaviors of your system
    will be depended on by somebody.

* https://www.hyrumslaw.com

the_mitsuhiko 143 days ago [-]

Sure, but that's not really the point is it? If you get a UUID you can store it as a UUID. If the UUID happens to come around as a v7 you get some better behavior in your database, and if it does not, then it does not but there is nothing you can do about.

hnav 143 days ago [-]

depends on the database, famously DynamoDB used to suffer from hotspotting when dealing with monotonically increasing keys

the_mitsuhiko 143 days ago [-]

You're missing the point here. You can always go from ordered to randomness. You cannot go from randomness to ordered. So by intentionally removing the useful properties of UUIDv7, you're taking away some external API consumers' hypothetical possibility to leverage benefits. If I know (as an API consumer) that I have a database that for whatever reason prefers evenly distributed primary keys or something similar, I can always accomplish that by hashing. I just can never go the other way.

Lvl999Noob 143 days ago [-]

Never use someone else's synthetic key as your primary key. If you want ordered keys, even if the API is giving out sequential integers, you should still use your own sequential IDs.

avemg 143 days ago [-]

I take your point, but I think your hypothetical is a wonderful example of Hyrum's Law. And for that reason, if I was going to go to the trouble of mapping my internal v7 uuids into something more random for public consumption, then I'd be sure generate something that doesn't look like a uuid at all so nobody gets any funny ideas about what they can do with it.

tart-lemonade 143 days ago [-]

Just to clarify, do you mean that UUIDv4 in general is worse, or just this 7->4 obfuscation?

the_mitsuhiko 143 days ago [-]

I'm not saying anything about better or worse. I'm saying that UUID v4 by definition has high entropy and UUID v7 does not. You can always go from low to high entropy, but not the other way around.

aabbdev 143 days ago [-]

You can always treat IDs as UUIDv4, while actually storing them as UUIDv7—combining the benefits of both. From your perspective, they’re just UUIDv4

kevlened 143 days ago [-]

One impact of the_mitsuhiko's second point is during debugging.

Usually if you see an id in your http logs you can simply search your database for that id. The v4 to v7 indirection creates a small inconvenience.

The mismatch may be resolved if this was available as a fully transparent database optimization.

aabbdev 143 days ago [-]

A Postgres extension is currently in development to provide transparent database optimization with custom type uuid45 and optional helpers ;)

the_mitsuhiko 143 days ago [-]

That would generally be nice to have. I would love to have base62 encoded IDs with prefixes but store it internally as UUID.

nightpool 143 days ago [-]

Not just a small inconvenience—because there's no human readable way to tell the difference between v4 and v7 IDs, you have to guess and check whether or not the ID your server process is logging is a pre-conversion or post-conversion ID

Dylan16807 143 days ago [-]

The human readable way to tell the difference is you look at whether the third group starts with a 4 or a 7.

kbumsik 142 days ago [-]

It is really easy to tell the difference btw. You will always see "4" or "7" in the middle.

thunderfork 143 days ago [-]

This seems like the kind of tool you would only use where you have the following needs:

1. Not leaking timestamp data (security/regulations)

2. Having easily time-sortable primary keys (DB performance/etc.)

If you don't have both of these needs, the tool is an unnecessary indirection, as you've identified in (2).

However, where you do have both needs, some indirection is necessary. Whether this is the correct one is a different question.

Similarly, if you _must not_ leak timestamps for some real-world reason, (1) is an intrinsic requirement, consumers be damned.

the_mitsuhiko 143 days ago [-]

If you must not leak timestamps then you also cannot really have timestamp ordering internally because you will happen to start leak that out in other ways through collection based endpoints.

JimDabell 143 days ago [-]

Not necessarily. For instance, in situations where unprivileged users can only see single items but privileged users can see collections. But yeah, time-ordering leaks information to people who can see the collection.

inopinatus 143 days ago [-]

This scheme potentially leaks timestamp, serialisation, and record-correlation data because the specification of UUIDv7 allows for partial timestamps and incrementing counters in the so-called random bits, which are passed through undisturbed.

So it is not generally fit for that purpose either.

AprilArcus 143 days ago [-]

Those seem like standard needs for any kind of CRUD app, so I would call this approach pretty useful. Currently I do something similar by keeping a private primary uuidv7 key with a btree index (a sortable index), and a separate public uuidv4 with a hash index (a lookup index), which is a workable but annoying arrangement. This solution achieves the same effect and is simpler.

nightpool 143 days ago [-]

Why can't you leak timestamp data? What timestamp data is sensitive to your system?

Also, why use UUIDs in that case?

inopinatus 143 days ago [-]

My biggest concern is the entropic quality of the random bits, since the design of UUIDv7 is fundamentally more concerned with collisions than predictability; consequently, although the standard says SHOULD for their nonguessability it isn't a MUST, and leaves room for implementations that use a weak PRNG, or that increment a counter, or even place additional clock data in the apparently random bits (ref. RFC9562 s6.2 & s6.9).

So there's definitely some gotchas with relying on rand_a and rand_b in UUIDv7 for seeding a PRF, and when ingesting data from devices outside of your trust boundary (as may be the case with high-volume telemetry), even if you wrote the code they basically can't be trusted for this purpose, and if those bits are undisturbed in the output it's certainly a problem if the idea was to obfuscate serialisation, timing, or correlation.

Even generations we might assume are safe may not be completely safe; for example, the new uuidv7() in PostgreSQL 18 fills rand_a entirely from the high precision part of the timestamp, and this is RFC compliant. So if an import routine generates a big batch of such UUIDs, this v7-to-v4 scheme discloses output bits that can be used to relate individual records as part of the same group. That might be fine for data points pertaining to a vehicle engine. It might not be fine for identifiers that relate to people.

So, since not all UUIDv7 is created alike, I'd add a strong caveat: unless generating the rand_a and rand_b bits entirely oneself with a high degree of confidence in their nonguessibility, then this scheme may still leak information regarding timing, sequence, or correlation of records, and you will have to read the source code of your UUIDv7 implementation to know for sure.

machinate 141 days ago [-]

Very pertinent, good eye.

If they are suitably random then this scheme seems to check out, but you're going to need some barbed wire and some inspiration from these https://en.wikipedia.org/wiki/Long-term_nuclear_waste_warnin... on anything that can generate v7 IDs.

143 days ago [-]

sergeyprokhoren 143 days ago [-]

Bad idea. In PostgreSQL 18 the optional parameter shift will shift the computed timestamp by the given interval

https://www.postgresql.org/docs/18/functions-uuid.html

michelpp 143 days ago [-]

That still exposes the timestamp, and the shift just drops precision, so I'm not sure what you're going for here.

sergeyprokhoren 142 days ago [-]

If you shift the timestamp forward by 5 thousand years, it can hardly be called just a decrease in precision.

chrismorgan 143 days ago [-]

A few years ago I made a scheme whereby you could use sequential numeric IDs in your database, but expose them as short random strings (length 4–20 step 2, depending on numeric value and sparsity configuration). It used some custom instances of the Speck cipher family, and I think it’s robust and rather neat.

Although I finished it, I never quite published it properly for some reason, probably partly because I shelved the projects where I had been going to use it (I might unshelve one of them next year).

Well, I might as well share it, because it’s quite relevant here and interesting:

https://temp.chrismorgan.info/2025-09-17-tesid/

My notes on its construction, pros and cons are fairly detailed.

Maybe I’ll go back and publish it properly next year.

austinjp 143 days ago [-]

Nice. See also sqids (previously known as hashids)

https://sqids.org/

chrismorgan 143 days ago [-]

I would not recommend it to anyone for any purpose: https://temp.chrismorgan.info/2025-09-17-tesid/more/#hashids

(Ah, it’s fun reading through that document a bit again. A few things I’d need to update now, like the Hashids name, or in the UUID section how UUIDv7 is no longer a draft, and of sidenote 12 I moved to India and got married and so took a phone number ending in 65536, replacing my Australian 32768. :-) )

9rx 143 days ago [-]

> I would not recommend it to anyone for any purpose

The most likely purpose for this kind of encoding is to discourage users (as in other developers) from trying to derive meaning from the values that is not actually there.

This happens all the time: Another developer using your API observes sequential IDs, for example, and soon they start building their software on top of that observation, assuming it to be an intended property of the system. It even works perfectly for a while... until you want to change your implementation and break those assumptions. Which you now can't do, because breaking users is the cardinal sin of software development, leaving you forever beholden to implementation details that were never intended to leak out. That's not a good place to be. Making the IDs "opaque" indicates to the user that there is no other meaning.

That they are guessable doesn't matter. I dare say it may even be beneficial to be able to easily reverse the strings back into their original form to aid with things like debugging. Software development is primarily about communicating with other people, and using IDs that, at first glance, look random communicates a lot — even if they aren't actually random.

There may be a time and place for actually secure IDs, but more often than not you don't really need them. What you do regularly need, though, especially in large organizations, is a way to effectively work with others who don't read the documentation.

> It’s just bad

This is the first I've heard of Hashids, so I'll take your word for it, but I'm not sure you actually articulated why. I'll grant you that excluding profanity is a stupid need, but it is understandable why one might have to accept that as a necessary feature even if ultimately ridiculous.

143 days ago [-]

Bjartr 143 days ago [-]

I'd never use hashids/sqids for anything secure. It's reversible by design.

However, it is fit for purpose if your purpose is showing user-facing ids that can't be trivially incremented. For example, in a url, or in an api response. It does, in fact, "protect" against the "attack" of "Oh, I see in the url that my id is 19563, I wonder what I get if I change it to 19564.”

Now, the system should absolutely have authorization boundaries around data, but that doesn't mean there's no value in avoiding putting an "attractive nuisance" in front of users.

sedatk 143 days ago [-]

> "protect" against the "attack"

If it's not a real attack, it's not worth protecting against even in the slightest. If it's a real attack, it doesn't matter if it's trivial or not, does it?

9rx 143 days ago [-]

It very much can be worth protecting so that your users don't become dependent on thinking that increment IDs is a feature. It's not a security concern in that context, but it is a future maintainability concern where you don't intend to provide that as a feature in environments where you don't have a tight leash on how users are using your APIs.

bflesch 143 days ago [-]

Hey Chris, that's a really nice blogpost. Not only the content but also the design / sidenotes. What kind of software stack do you run your block with?

chrismorgan 143 days ago [-]

https://chrismorgan.info/blog/2019-website/

It’s lasted for three years of use and three years of disuse, and I hope to replace it with something utterly different (stylistically and technically) by the end of this year, though it may slip to next year. The replacement will be based on handwriting.

bflesch 143 days ago [-]

Thanks. I like it very much, perfect dark mode. The serif font could be a tiny bit bigger for readability. Not a fan of handwriting fonts but you do you :-)

chrismorgan 143 days ago [-]

Who said handwriting fonts?

(I’m not a fan of handwriting fonts either. They’re never truly satisfying, though some with quite a few variants for each character get past the point of feeling transparently inauthentic. But when you can write and draw what you choose, where you choose, that’s liberating.)

hubert_magni 143 days ago [-]

Here is a Ruby gem for generating and managing pretty, human-readable keys in ActiveRecord models - uses sqids and a ticket table:

https://github.com/noreastergroup/active_record_pretty_key

bflesch 143 days ago [-]

Oh thanks for sharing this. Many years ago I was asked to code such a thing during an interview and I totally screwed it up, and of course I forgot the name of this technique.

I wanted to use it many times in project for non-iteratable IDs but never found it again.

inopinatus 143 days ago [-]

I was interested in something similar with Speck for obfuscating bigserial PKIDs but the shortage of cross-platform implementations - especially in pgcrypto - led to choosing base58(AES_K1(id{8} || HMAC_K2(id{8})[0..7])) instead, which we could implement in almost anything and is performant enough, albeit longer output (typically 22 characters)

chrismorgan 143 days ago [-]

For this specific use case, you don’t need anything fancy like a constant time implementation, and I found it easy enough to implement from the paper—except that, mindbogglingly, they didn’t address endianness at all, even though you have to take it into account; so you need to read https://www.spinics.net/lists/arm-kernel/msg633602.html as well.

Look at https://git.chrismorgan.info/tesid/blob/HEAD:/rust/src/fpeck..., it’s very simple.

143 days ago [-]

chuckadams 143 days ago [-]

I remember doing something similar, but I just used two columns, a public uuid, and a bigint primary key that wasn't exposed to the api (this was long before uuidv7). Lacked a lot of the conveniences of using uuid everywhere, but it still handled the use case of merging different DB dumps as long as PKs were stripped out first.

And maybe I misunderstand how the hashing works, but it seems if you're looking things up by the hashed uuid, you're still going to want two columns anyway.

connicpu 143 days ago [-]

The conversion is reversible using the secret cryptographic key so you can turn the uuidv4s from requests into your db uuidv7s.

miningape 143 days ago [-]

This is interesting, but is almost something I'd rather have the DB handle for me - i.e. I can cast a UUIDv7 to "UUIDv4" (and vice versa) and I could use both in queries (with explicit syntax to annotate which kind is being used / expected)

tracker1 143 days ago [-]

Interesting project... just out of curiosity, could you give something resembling a couple practical examples of the risk of exposing the time portion of a v7 UUID?

NortySpock 143 days ago [-]

Suppose it's something where the user may be accused of doing something nefarious if a sequence or pattern of behavior is exposed.

- "Ex-spouse: I looked you up on a dating website, and your userID indicates it was created while you were at Tom's party where you swear nothing happened."

- "You say you are in XYZ timezone, but all your imageIDs (that are unique to the image upon creation) are timestamped at what would be 3am in your timezone)"

Granted, for individual messages that are near-real-time, or for transactions that need to be timestamped anyway, it's probably fine, but for user-account-creation or "evergreen" asset-creation, it could leak the time to a sufficiently curious individual (or an organized group that is doing data-trawling and cross-correlation)

0x457 143 days ago [-]

> - "You say you are in XYZ timezone, but all your imageIDs (that are unique to the image upon creation) are timestamped at what would be 3am in your timezone)"

Can you expand on this? I don't see a situation where it's actually leaking. You either have a photo with EXIF or an image post were generated when post is created and created_at usually exposed.

bangaladore 143 days ago [-]

I've done CTFs in that past where a UUID is used to brute force an AES key. As the key was derived partially from the time source so by knowing the system time close to when the data was encrypted you could pretty easily brute force the key.

A more simple example is a URL for say a file / photo share service. You allow users to upload images, and you return them back website.com/GUID. That's it. You don't provide a way to see when that photo / file was updated, but because you use a UUIDv7 you just did.

Is this a security risk? Maybe or maybe not? But it's an unintended disclosure of information.

thunderfork 143 days ago [-]

Let's say you've got a system that collects medical data - like "store the results of the MRI right after it happens".

For analysis reasons, you want to share this dataset (e.g. for diagnostics on the machine) but first must strip it of potentially identifying information.

The uuidv7 timestamp could be used to re-identify the data through correlation - "I know this person got an MRI on this day, there's only one record with a matching datestamp, thus I know it's their MRI."

tracker1 143 days ago [-]

Fair enough, thanks... I've got more experience in education/elearning, banking and elections, all of which are likely to have separate timestamp records required anyway, so this kind of scenario didn't really jump out at me.

ericyd 143 days ago [-]

I'm not sold on this example. If you're already preprocessing data for analysis purposes, why not just remove the ID altogether? I can't imagine a specific record ID being required for analytics

thunderfork 143 days ago [-]

[dead]

bangaladore 143 days ago [-]

Good example.

It's pretty simple, unless when you provide a GUID to a party you are also willing to provide the timestamp when it was created, use UUIDv4.

sgarland 143 days ago [-]

This is cool, but the entire “OMG you can’t leak timestamps” has always reeked of security theater to me, as has the argument that if you expose sequential IDs, you’re opening vectors of attack, exposing business information, etc.

Add some random large value to your ints periodically - they’ll still be monotonic, but you’ll throw off the dastardly spies stealing your super valuable business intelligence.

danhau 142 days ago [-]

You‘re not exposing business information, you‘re exposing client information. The information a system leaks might not be intrinsically valuable, but it can be used to deduce other data, especially over larger sets or time.

For example, by only scraping the date and author of an online newspaper‘s articles over a period of time, you can deduce when every author is typically on vacation. Compare that against every other author and you can find patterns indicating, say, workplace affairs.

Source: a talk by David Kreisel called SpiegelMining (in German), or at least what I remember.

danhau 141 days ago [-]

Found an English translation of the mentioned video: https://www.youtube.com/watch?v=bYviBstTUwo

bismark 143 days ago [-]

My biggest issue w/ UUIDv7 is how challenging they are to visually diff when looking at a list. Having some sort of visual translation layer in psql that would render them with the random bits first while maintaining the time sorting underneath would be a major UX boost...

phs2501 143 days ago [-]

I just taught myself to look at the end of the UUID, rather than the beginning.

143 days ago [-]

nine_k 143 days ago [-]

Write a function that does that, use it in your queries. E.g. simple hex representation + string reversal should help. Or a reversed base64 representation for shorter output.

funcimp 142 days ago [-]

This is super cool. I decided to code up a Go implementation with the help of dchest's excellent siphash library.

https://github.com/n2p5/uuid47

refs: https://github.com/dchest/siphash

g-mork 143 days ago [-]

Vaguely related technique with similar goals (but I love the one posted here) http://blog.notdot.net/2007/9/Damn-Cool-Algorithms-Part-2-Se...

timando 143 days ago [-]

Why does it use version 4 instead of version 8? Version 4 implies that it's random bits, but it's actually not random. Version 8 doesn't imply anything about what the bits mean.

flowerthoughts 142 days ago [-]

I can't answer that, but as long as it's a high entropy algorithm, this seems fair game. You could see it as a seeded PRNG. The whole point of the exercise is to make it look random to the outside. Perhaps v8 stands out too much.

devnull3 142 days ago [-]

Why not use a different encryption key per session and stamp encrypted ids (or whatever info) to the outside word.

This way the DBs can use simple sequence numbers instead of timestamp based IDs.

conradludgate 142 days ago [-]

You have to know what key to use to decrypt the timestamp bits of the token. If you change keys regularly you have the problem of keeping lots of keys, as well as somehow determining the right key

taminka 143 days ago [-]

i'm curious, if you're doing single header, why not also do the stb-style IMPL block + definitions block such that you avoid the issues from accidentally including the header multiple times?

LeicaLatte 143 days ago [-]

Mobile apps often sort by creation time in the UI (chat messages, activity feeds). Since clients only see the masked version, there might be a need to expose a separate timestamp field.

londons_explore 142 days ago [-]

Before using this....

Consider what you'll do if someone ever gets root in your web server and leaks the key.

Suddenly all your UUID's need to be replaced. That tends to be impossible since they're probably part of published URL's etc.

Big companies have made similar mistakes - that's probably why for example all private YouTube videos and Google docs had their links invalidated a few years back when the key security of a decade old key couldn't be certain and the key wasn't rotatable.

TL;DR: Never use anything where you cannot rotate a key, including this.

gwbas1c 143 days ago [-]

I started encrypting database IDs and deriving GUIDs from that.

salterdavid032 143 days ago [-]

[dead]

optimize_prime 138 days ago [-]

[dead]

themafia 143 days ago [-]

Why not just use UUIDv8? The format allows you to use the upper bits for a timestamp and the lower bits for any value you like, including just a random value.

michelpp 143 days ago [-]

Because then you leak the timestamp. The idea is, present what looks like v4 random uuids externally, but they are stored internally with v7 which greatly improves locality and index usability. The conversion back and forth happens with a secret key.

themafia 143 days ago [-]

What problem does leaking the timestamp cause?

UUIDv8 gives you timestamp + counter + random.

The advantage is that lexical order and chronological order are the same and you still retain enough random bits that guessing the next generated timestamp is not easy.

michelpp 143 days ago [-]

uuidv8 does not contain a timestamp or counter unless you put them in there, it only contains a version and variant field. It's a very broad format that lets you contain whatever bits you want.

This library converts a uuidv7 into a cryptographically random but deterministic uuidv4 recoverable with a shared key. For all intents and purposes the external view is a uuidv4, the internal representation is a v7, which has better index block locality and orderability.

pluto_modadic 143 days ago [-]

this is solved by reading the repo's README: hiding timing information.

jppope 143 days ago [-]

Sounds like its trying to achieve something similar to what ULID is going for: https://github.com/ulid/spec

timestamp + readability

mcdonje 143 days ago [-]

Except the timestamp is in the ULID for anyone to read. UUID47 hides that from external parties.

Loading comments...

aabbdev 143 days ago [-]

Hi, I’m the author of uuidv47. The idea is simple: keep UUIDv7 internally for database indexing and sortability, but emit UUIDv4-looking façades externally so clients don’t see timing patterns.

Security: SipHash is a PRF, so observing façades doesn’t leak the key. Wrong key = wrong timestamp. Rotation can be done with a key-ID outside the UUID.

Performance: one SipHash over 10 bytes + a couple of 48-bit loads/stores. Nanosecond overhead, header-only C11, no deps, allocation-free.

Tests: SipHash reference vectors, round-trip encode/decode, and version/variant invariants.

Curious to hear feedback!

JimDabell 143 days ago [-]

I like the idea.

move-on-by 143 days ago [-]

Any version of UUID except v4 on the client side would be a mistake- as you are relying on it to provide extra information such as a timestamp which might be manipulated.

JimDabell 143 days ago [-]

move-on-by 142 days ago [-]

lazide 141 days ago [-]

What if a broken client implementation uses the same client ‘generated’ UUID (or very similar) for all client requests?

knome 143 days ago [-]

creating your uuids client side has a risk of clients toying with the uuids.

for the uuidv47 thing, you would apply their XOR trick prior to sending the UUID to the user. you presumably just reverse the XOR trick to get the UUIDv7 back from the UUIDv4 you passed them.

Lvl999Noob 143 days ago [-]

Why not have a transient client generated ID for idempotency but a server generated ID for long term reference and storage?

ycombinatrix 143 days ago [-]

>UUIDs are often generated client-side

since when?

darkr 143 days ago [-]

It’s not uncommon. Google AIP spec requires it for example. I think the main driver for it is implicit idempotency.

eadmund 142 days ago [-]

The client’s ID for a resource and the server’s ID for that resource need not be the same.

Of course, adding two IDs for a resource complicates things. But so too does trusting client-generated IDs to be universally unique.

143 days ago [-]

the_mitsuhiko 143 days ago [-]

Two pieces of feedback here:

1. You implicitly take away someone else's hypothetical benefit of leveraging UUID v7, which is disappointing for any consumer of your API.

whatevaa 143 days ago [-]

1. Unless API explicitly guarantees that property, relying on that is bad idea. I wouldn't.

throw0101a 143 days ago [-]

> 1. Unless API explicitly guarantees that property, relying on that is bad idea. I wouldn't.

    With a sufficient number of users of an API,
    it does not matter what you promise in the contract:
    all observable behaviors of your system
    will be depended on by somebody.

* https://www.hyrumslaw.com

the_mitsuhiko 143 days ago [-]

hnav 143 days ago [-]

depends on the database, famously DynamoDB used to suffer from hotspotting when dealing with monotonically increasing keys

the_mitsuhiko 143 days ago [-]

Lvl999Noob 143 days ago [-]

Never use someone else's synthetic key as your primary key. If you want ordered keys, even if the API is giving out sequential integers, you should still use your own sequential IDs.

avemg 143 days ago [-]

tart-lemonade 143 days ago [-]

Just to clarify, do you mean that UUIDv4 in general is worse, or just this 7->4 obfuscation?

the_mitsuhiko 143 days ago [-]

I'm not saying anything about better or worse. I'm saying that UUID v4 by definition has high entropy and UUID v7 does not. You can always go from low to high entropy, but not the other way around.

aabbdev 143 days ago [-]

You can always treat IDs as UUIDv4, while actually storing them as UUIDv7—combining the benefits of both. From your perspective, they’re just UUIDv4

kevlened 143 days ago [-]

One impact of the_mitsuhiko's second point is during debugging.

Usually if you see an id in your http logs you can simply search your database for that id. The v4 to v7 indirection creates a small inconvenience.

The mismatch may be resolved if this was available as a fully transparent database optimization.

aabbdev 143 days ago [-]

A Postgres extension is currently in development to provide transparent database optimization with custom type uuid45 and optional helpers ;)

the_mitsuhiko 143 days ago [-]

That would generally be nice to have. I would love to have base62 encoded IDs with prefixes but store it internally as UUID.

nightpool 143 days ago [-]

Dylan16807 143 days ago [-]

The human readable way to tell the difference is you look at whether the third group starts with a 4 or a 7.

kbumsik 142 days ago [-]

It is really easy to tell the difference btw. You will always see "4" or "7" in the middle.

thunderfork 143 days ago [-]

This seems like the kind of tool you would only use where you have the following needs:

1. Not leaking timestamp data (security/regulations)

2. Having easily time-sortable primary keys (DB performance/etc.)

If you don't have both of these needs, the tool is an unnecessary indirection, as you've identified in (2).

However, where you do have both needs, some indirection is necessary. Whether this is the correct one is a different question.

Similarly, if you _must not_ leak timestamps for some real-world reason, (1) is an intrinsic requirement, consumers be damned.

the_mitsuhiko 143 days ago [-]

If you must not leak timestamps then you also cannot really have timestamp ordering internally because you will happen to start leak that out in other ways through collection based endpoints.

JimDabell 143 days ago [-]

inopinatus 143 days ago [-]

So it is not generally fit for that purpose either.

AprilArcus 143 days ago [-]

nightpool 143 days ago [-]

Why can't you leak timestamp data? What timestamp data is sensitive to your system?

Also, why use UUIDs in that case?

inopinatus 143 days ago [-]

machinate 141 days ago [-]

Very pertinent, good eye.

143 days ago [-]

sergeyprokhoren 143 days ago [-]

Bad idea. In PostgreSQL 18 the optional parameter shift will shift the computed timestamp by the given interval

https://www.postgresql.org/docs/18/functions-uuid.html

michelpp 143 days ago [-]

That still exposes the timestamp, and the shift just drops precision, so I'm not sure what you're going for here.

sergeyprokhoren 142 days ago [-]

If you shift the timestamp forward by 5 thousand years, it can hardly be called just a decrease in precision.

chrismorgan 143 days ago [-]

Although I finished it, I never quite published it properly for some reason, probably partly because I shelved the projects where I had been going to use it (I might unshelve one of them next year).

Well, I might as well share it, because it’s quite relevant here and interesting:

https://temp.chrismorgan.info/2025-09-17-tesid/

My notes on its construction, pros and cons are fairly detailed.

Maybe I’ll go back and publish it properly next year.

austinjp 143 days ago [-]

Nice. See also sqids (previously known as hashids)

https://sqids.org/

chrismorgan 143 days ago [-]

I would not recommend it to anyone for any purpose: https://temp.chrismorgan.info/2025-09-17-tesid/more/#hashids

9rx 143 days ago [-]

> I would not recommend it to anyone for any purpose

The most likely purpose for this kind of encoding is to discourage users (as in other developers) from trying to derive meaning from the values that is not actually there.

> It’s just bad

143 days ago [-]

Bjartr 143 days ago [-]

I'd never use hashids/sqids for anything secure. It's reversible by design.

Now, the system should absolutely have authorization boundaries around data, but that doesn't mean there's no value in avoiding putting an "attractive nuisance" in front of users.

sedatk 143 days ago [-]

> "protect" against the "attack"

If it's not a real attack, it's not worth protecting against even in the slightest. If it's a real attack, it doesn't matter if it's trivial or not, does it?

9rx 143 days ago [-]

bflesch 143 days ago [-]

Hey Chris, that's a really nice blogpost. Not only the content but also the design / sidenotes. What kind of software stack do you run your block with?

chrismorgan 143 days ago [-]

https://chrismorgan.info/blog/2019-website/

bflesch 143 days ago [-]

Thanks. I like it very much, perfect dark mode. The serif font could be a tiny bit bigger for readability. Not a fan of handwriting fonts but you do you :-)

chrismorgan 143 days ago [-]

Who said handwriting fonts?

hubert_magni 143 days ago [-]

Here is a Ruby gem for generating and managing pretty, human-readable keys in ActiveRecord models - uses sqids and a ticket table:

https://github.com/noreastergroup/active_record_pretty_key

bflesch 143 days ago [-]

Oh thanks for sharing this. Many years ago I was asked to code such a thing during an interview and I totally screwed it up, and of course I forgot the name of this technique.

I wanted to use it many times in project for non-iteratable IDs but never found it again.

inopinatus 143 days ago [-]

chrismorgan 143 days ago [-]

Look at https://git.chrismorgan.info/tesid/blob/HEAD:/rust/src/fpeck..., it’s very simple.

143 days ago [-]

chuckadams 143 days ago [-]

And maybe I misunderstand how the hashing works, but it seems if you're looking things up by the hashed uuid, you're still going to want two columns anyway.

connicpu 143 days ago [-]

The conversion is reversible using the secret cryptographic key so you can turn the uuidv4s from requests into your db uuidv7s.

miningape 143 days ago [-]

tracker1 143 days ago [-]

Interesting project... just out of curiosity, could you give something resembling a couple practical examples of the risk of exposing the time portion of a v7 UUID?

NortySpock 143 days ago [-]

Suppose it's something where the user may be accused of doing something nefarious if a sequence or pattern of behavior is exposed.

- "Ex-spouse: I looked you up on a dating website, and your userID indicates it was created while you were at Tom's party where you swear nothing happened."

- "You say you are in XYZ timezone, but all your imageIDs (that are unique to the image upon creation) are timestamped at what would be 3am in your timezone)"

0x457 143 days ago [-]

> - "You say you are in XYZ timezone, but all your imageIDs (that are unique to the image upon creation) are timestamped at what would be 3am in your timezone)"

Can you expand on this? I don't see a situation where it's actually leaking. You either have a photo with EXIF or an image post were generated when post is created and created_at usually exposed.

bangaladore 143 days ago [-]

Is this a security risk? Maybe or maybe not? But it's an unintended disclosure of information.

thunderfork 143 days ago [-]

Let's say you've got a system that collects medical data - like "store the results of the MRI right after it happens".

For analysis reasons, you want to share this dataset (e.g. for diagnostics on the machine) but first must strip it of potentially identifying information.

tracker1 143 days ago [-]

ericyd 143 days ago [-]

I'm not sold on this example. If you're already preprocessing data for analysis purposes, why not just remove the ID altogether? I can't imagine a specific record ID being required for analytics

thunderfork 143 days ago [-]

[dead]

bangaladore 143 days ago [-]

Good example.

It's pretty simple, unless when you provide a GUID to a party you are also willing to provide the timestamp when it was created, use UUIDv4.

sgarland 143 days ago [-]

Add some random large value to your ints periodically - they’ll still be monotonic, but you’ll throw off the dastardly spies stealing your super valuable business intelligence.

danhau 142 days ago [-]

Source: a talk by David Kreisel called SpiegelMining (in German), or at least what I remember.

danhau 141 days ago [-]

Found an English translation of the mentioned video: https://www.youtube.com/watch?v=bYviBstTUwo

bismark 143 days ago [-]

phs2501 143 days ago [-]

I just taught myself to look at the end of the UUID, rather than the beginning.

143 days ago [-]

nine_k 143 days ago [-]

Write a function that does that, use it in your queries. E.g. simple hex representation + string reversal should help. Or a reversed base64 representation for shorter output.

funcimp 142 days ago [-]

This is super cool. I decided to code up a Go implementation with the help of dchest's excellent siphash library.

https://github.com/n2p5/uuid47

refs: https://github.com/dchest/siphash

g-mork 143 days ago [-]

Vaguely related technique with similar goals (but I love the one posted here) http://blog.notdot.net/2007/9/Damn-Cool-Algorithms-Part-2-Se...

timando 143 days ago [-]

Why does it use version 4 instead of version 8? Version 4 implies that it's random bits, but it's actually not random. Version 8 doesn't imply anything about what the bits mean.

flowerthoughts 142 days ago [-]

devnull3 142 days ago [-]

Why not use a different encryption key per session and stamp encrypted ids (or whatever info) to the outside word.

This way the DBs can use simple sequence numbers instead of timestamp based IDs.

conradludgate 142 days ago [-]

You have to know what key to use to decrypt the timestamp bits of the token. If you change keys regularly you have the problem of keeping lots of keys, as well as somehow determining the right key

taminka 143 days ago [-]

i'm curious, if you're doing single header, why not also do the stb-style IMPL block + definitions block such that you avoid the issues from accidentally including the header multiple times?

LeicaLatte 143 days ago [-]

Mobile apps often sort by creation time in the UI (chat messages, activity feeds). Since clients only see the masked version, there might be a need to expose a separate timestamp field.

londons_explore 142 days ago [-]

Before using this....

Consider what you'll do if someone ever gets root in your web server and leaks the key.

Suddenly all your UUID's need to be replaced. That tends to be impossible since they're probably part of published URL's etc.

TL;DR: Never use anything where you cannot rotate a key, including this.

gwbas1c 143 days ago [-]

I started encrypting database IDs and deriving GUIDs from that.

salterdavid032 143 days ago [-]

[dead]

optimize_prime 138 days ago [-]

[dead]

themafia 143 days ago [-]

Why not just use UUIDv8? The format allows you to use the upper bits for a timestamp and the lower bits for any value you like, including just a random value.

michelpp 143 days ago [-]

themafia 143 days ago [-]

What problem does leaking the timestamp cause?

UUIDv8 gives you timestamp + counter + random.

The advantage is that lexical order and chronological order are the same and you still retain enough random bits that guessing the next generated timestamp is not easy.

michelpp 143 days ago [-]

uuidv8 does not contain a timestamp or counter unless you put them in there, it only contains a version and variant field. It's a very broad format that lets you contain whatever bits you want.

pluto_modadic 143 days ago [-]

this is solved by reading the repo's README: hiding timing information.

jppope 143 days ago [-]

Sounds like its trying to achieve something similar to what ULID is going for: https://github.com/ulid/spec

timestamp + readability

mcdonje 143 days ago [-]

Except the timestamp is in the ULID for anyone to read. UUID47 hides that from external parties.