Neil Madden

Were URLs a bad idea?
When I was writing Rating 26 years of Java changes, I started reflecting on the new HttpClient library in Java 11. The old way of fetching a URL was to use URL.openConnection(). This was intended to be a generic mechanism for retrieving the contents of any URL: files, web resources, FTP servers, etc. It was a pluggable mechanism that could, in theory, support any type of URL at all. This was the sort of thing that was considered a good idea back in the 90s/00s, but has a bunch of downsides:
- Fetching different types of URLs can have wildly different security and performance implications, and wildly different failure cases. Do I really want to accept a mailto: URL or a javascript: “URL”? No, never.
- The API was forced to be lowest-common-denominator, so if you wanted to set options that are specific to a particular protocol then you had to cast the return URLConnection to a more specific sub-class (and therefore lose generality).
The new HttpClient in Java 11 is much better at doing HTTP, but it’s also specific to HTTP/HTTPS. And that seems like a good thing?

In fact, in the vast majority of cases the uniformity of URLs is no longer a desirable aspect. Most apps and libraries are specialised to handle essentially a single type of URL, and are better off because of it. Are there still cases where it is genuinely useful to be able to accept a URL of any (or nearly any) scheme?
Share this:
Email
Print
Facebook
Reddit
LinkedIn
Like Loading…
12 November, 2025

URLs, Web
Monotonic Collections: a middle ground between immutable and fully mutable
This post covers several topics around collections (sets, lists, maps/dictionaries, queues, etc) that I’d like to see someone explore more fully. To my knowledge, there are many alternative collection libraries for Java and for many other languages, but I’m not aware of any that provide support for monotonic collections. What is a monotonic collection, I hear you ask? Well, I’m about to answer that. Jesus, give me a moment.

It’s become popular, in the JVM ecosystem at least, for collections libraries to provide parallel class hierarchies for mutable and immutable collections: Set vs MutableSet, List vs MutableList, etc. I think this probably originated with Scala, and has been copied by Kotlin, and various alternative collection libraries, e.g. Eclipse Collections, Guava, etc. There are plenty of articles out there on the benefits and drawbacks of each type. But the gulf between fully immutable and fully mutable objects is enormous: they are polar opposites, with wildly different properties, performance profiles, and gotchas. I’m interested in exploring the space between these two extremes. (Actually, I’m interested in someone else exploring it, hence this post). One such point is the idea of monotonic collections, and I’ll now explain what that means.

By monotonic I mean here logical monotonicity: the idea that any information that is entailed by some set of logical formulas is also entailed by any superset of those formulas. For a collection data structure, I would formulate that as follows:

If any (non-negated) predicate is true of the collection at time t, then it is also true of the collection at any time t’ > t.

For example, if c is a collection and c.contains(x) returns true at some point in time, then it must always return true from then onwards.

To make this concrete, a MonotonicList (say) would have an append operation, but not insert, delete, or replace operations. More subtly, monotonic collections cannot have any aggregate operations: i.e., operations that report statistics/summary information on the collection as a whole. For example, you cannot have a size method, as the size will change as new items are added (and thus the predicate c.size() == n can become false). You can have (as I understand it) map and filter operations, but not a reduce/fold.

So why are monotonic collections an important category to look at? Firstly, monotonic collections can have some of the same benefits as immutable data structures, such as simplified concurrency. Secondly, monotonic collections are interesting because they can be (relatively) easily made distributed, per the CALM principle: Consistency as Logical Monotonicity (insecure link, sorry). This says that monotonic collections are strongly eventually consistent without any need for coordination protocols. Providing such collections would thus somewhat simplify making distributed systems.

Class hierarchies and mutability

Interestingly, Kotlin decided to make their mutable collection classes sub-types of the immutable ones: MutableList is a sub-type of List, etc. (They also decided to make the arrows go the other way from normal in their inheritance diagram, crazy kids). This makes sense in one way: mutable structures offer more operations than immutable ones. But it seems backwards from my point of view: it says that all mutable collections are immutable, which is logically false. (But then they don’t include the word Immutable in the super types). It also means that consumers of a List can’t actually assume it is immutable: it may change underneath them. Guava seems to make the opposite decision: ImmutableList extends the built-in (mutable) List type, probably for convenience. Both options seem to have drawbacks.

I think the way to resolve this is to entirely separate the read-only view of a collection from the means to update it. On the view-side, we would have a class hierarchy consisting of ImmutableList, which inherits from MonotonicList, which inherits from the general List. On the mutation side, we’d have a ListAppender and ListUpdater classes, where the latter extends the former. Creating a mutable or monotonic list would return a pair of the read-only list view, and the mutator object, something like the following (pseudocode):
```
ImmutableList<T> list = ImmutableList.of(....); // normal
Pair<MonotonicList<T>, ListAppender<T>> mono = MonotonicList.of(...);
Pair<List<T>, ListUpdater<T>> mut = List.of(...);
```
The type hierarchies would look something like the following:
```
interface List<E> {
    void forEach(Consumer<E> action);
    ImmutableList<E> snapshot();
}

interface MonotonicList<E> extends List<E> {
    boolean contains(E element);
    // Positive version of isEmpty():
    boolean containsAnything(); 
    <T> MonotonicList<T> map(Function<E, T> f);
    MonotonicList<E> filter(Predicate<E> p);
}

interface ImmutableList<E> extends MonotonicList<E> {
    int size();
    <T> T reduce(BiFunction<E, T, T> f, T initial);
}

interface ListAppender<E> {
    void append(E element);
}

interface ListUpdater<E> extends ListAppender<E> {
    E remove(int index);
    E replace(int index, E newValue);
    void insert(int index, E newValue);
}
```
This seems to satisfy allowing the natural sub-type relationships between types on both sides of the divide. It’s a sort of CQRS at the level of data structures, but it seems to solve the issue that the inheritance direction for read-only consumers is the inverse of the natural hierarchy for mutating producers. (This has a relationship to covariant/contravariant subtypes, but I’m buggered if I’m looking that stuff up again on my free time).

Anyway, these thoughts are obviously pretty rough, but maybe some inklings of ideas if anyone is looking for an interesting project to work on.
Share this:
Email
Print
Facebook
Reddit
LinkedIn
Like Loading…
11 November, 2025

data structures, Functional Programming, immutability, Java, programming
Fluent Visitors: revisiting a classic design pattern
It’s been a while since I’ve written a pure programming post. I was recently implementing a specialist collection class that contained items of a number of different types. I needed to be able to iterate over the collection performing different actions depending on the specific type. There are lots of different ways to do this, depending on the school of programming you prefer. In this article, I’m going to take a look at a classic “Gang of Four” design pattern: The Visitor Pattern. I’ll describe how it works, provide some modern spins on it, and compare it to other ways of implementing the same functionality. Hopefully even the most die-hard anti-OO/patterns reader will come away thinking that there’s something worth knowing here after all.

(Design Patterns? In this economy?)
(more…)
Share this:
Email
Print
Facebook
Reddit
LinkedIn
Like Loading…
4 November, 2025

Design Patterns, Functional Programming, Java, programming, Visitor Pattern
Rating 26 years of Java changes
I first started programming Java at IBM back in 1999 as a Pre-University Employee. If I remember correctly, we had Java 1.1.8 installed at that time, but were moving to Java 1.2 (“Java 2”), which was a massive release—I remember engineers at the time grumbling that the ever-present “Java in a Nutshell” book had grown to over 600 pages. I thought I’d take a look back at 26 years of Java releases and rate some of the language and core library changes (Java SE only) that have occurred over this time. It’s a very different language to what I started out with!

I can’t possibly cover every feature of those releases, as there are just way too many. So I’m just going to cherry-pick some that seemed significant at the time, or have been in retrospect. I’m not going to cover UI- or graphics-related stuff (Swing, Java2D etc), or VM/GC improvements. Just language changes and core libraries. And obviously this is highly subjective. Feel free to put your own opinions in the comments! The descriptions are brief and not intended as an introduction to the features in question: see the links from the Wikipedia page for more background.

NB: later features are listed from when they were first introduced as a preview.

Java 2 – 1998

The Collections Framework: before the collections framework, there was just raw arrays, Vector, and Hashtable. It gets the job done, but I don’t think anyone thinks the Java collections framework is particularly well designed. One of the biggest issues was a failure to distinguish between mutable and immutable collections, strange inconsistencies like why Iterator as a remove() method (but not, say, update or insert), and so on. Various improvements have been made over the years, and I do still use it in preference to pulling in a better alternative library, so it has shown the test of time in that respect. 4/10

Java 1.4 – 2002

The assert keyword: I remember being somewhat outraged at the time that they could introduce a new keyword! I’m personally quite fond of asserts as an easy way to check invariants without having to do complex refactoring to make things unit-testable, but that is not a popular approach. I can’t remember the last time I saw an assert in any production Java code. 3/10

Regular expressions: Did I really have to wait 3 years to use regex in Java? I don’t remember ever having any issues with the implementation they finally went for. The Matcher class is perhaps a little clunky, but gets the job done. Good, solid, essential functionality. 9/10

“New” I/O (NIO): Provided non-blocking I/O for the first time, but really just a horrible API (still inexplicably using 32-bit signed integers for file sizes, limiting files to 2GB, confusing interface). I still basically never use these interfaces except when I really need to. I learnt Tcl/Tk at the same time that I learnt Java, and Java’s I/O always just seemed extraordinarily baroque for no good reason. Has barely improved in 2 and a half decades. 0/10

Also notable in this release was the new crypto APIs: the Java Cryptography Extensions (JCE) added encryption and MAC support to the existing signatures and hashes, and we got JSSE for SSL. Useful functionality, dreadful error-prone APIs. 1/10

Java 5 – 2004

Absolutely loads of changes in this release. This feels like the start of modern Java to me.

Generics: as Go discovered on its attempt to speed-run Java’s mistakes all over again, if you don’t add generics from the start then you’ll have to retrofit them later, badly. I wouldn’t want to live without them, and the rapid and universal adoption of them shows what a success they’ve been. They certainly have complicated the language, and there are plenty of rough edges (type erasure, reflection, etc), but God I wouldn’t want to live without them. 8/10.

Annotations: sometimes useful, sometimes overused. I know I’ve been guilty of abusing them in the past. At the time it felt like they were ushering a new age of custom static analysis, but that doesn’t really seem to be used much. Mostly just used to mark things as deprecated or when overriding a method. Meh. 5/10

Autoboxing: there was a time when, if you wanted to store an integer in a collection, you had to manually convert to and from the primitive int type and the Integer “boxed” class. Such conversion code was everywhere. Java 5 got rid of that, by getting the compiler to insert those conversions for you. Brevity, but no less inefficient. 7/10

Enums: I’d learned Haskell by this point, so I couldn’t see the point of introducing enums without going the whole hog and doing algebraic datatypes and pattern-matching. (Especially as Scala launched about this time). Decent feature, and a good implementation, but underwhelming. 6/10

Vararg methods: these have done quite a lot to reduce verbosity across the standard library. A nice small improvement that’s had a good quality of life enhancement. I still never really know when to put @SafeVarargs annotations on things though. 8/10

The for-each loop: cracking, use it all the time. Still not a patch on Tcl’s foreach (which can loop over multiple collections at once), but still very good. Could be improved and has been somewhat replaced by Streams. 8/10

Static imports: Again, a good simple change. I probably would have avoided adding * imports for statics, but it’s quite nice for DSLs. 8/10

Doug Lea’s java.util.concurrent etc: these felt really well designed. So well designed that everyone started using them in preference to the core collection classes, and they ended up back-porting a lot of the methods. 10/10

Java 7 – 2011

After the big bang of Java 5, Java 6 was mostly performance and VM improvements, I believe, so we had to wait until 2011 for more new language features.

Strings in switch: seems like a code smell to me. Never use this, and never see it used. 1/10

Try-with-resources: made a huge difference in exception safety. Combined with the improvements in exception chaining (so root cause exceptions are not lost), this was a massive win. Still use it everywhere. 10/10

Diamond operator for type parameter inference: a good minor syntactic improvement to cut down the visual noise. 6/10

Binary literals and underscores in literals: again, minor syntactic sugar. Nice to have, rarely something I care about much. 4/10

Path and Filesystem APIs: I tend to use these over the older File APIs, but just because it feels like I should. I couldn’t really tell you if they are better or not. Still overly verbose. Still insanely hard to set file permissions in a cross-platform way. 3/10

Java 8 – 2014

Lambdas: somewhat controversial at the time. I was very in favour of them, but only use them sparingly these days, due to ugly stack traces and other drawbacks. Named method references provide most of the benefit without being anonymous. Deciding to exclude checked exceptions from the various standard functional interfaces was understandable, but also regularly a royal PITA. 4/10

Streams: Ah, streams. So much potential, but so frustrating in practice. I was hoping that Java would just do the obvious thing and put filter/map/reduce methods onto Collection and Map, but they went with this instead. The benefits of functional programming weren’t enough to carry the feature, I think, so they had to justify it by promising easy parallel computing. This scope creep enormously over-complicated the feature, makes it hard to debug issues, and yet I almost never see parallel streams being used. What I do still see quite regularly is resource leaks from people not realising that the stream returned from Files.lines() has to be close()d when you’re done—but doing so makes the code a lot uglier. Combine that with ugly hacks around callbacks that throw checked exceptions, the non-discoverable API (where are the static helper functions I need for this method again?), and the large impact on lots of very common code, and I have to say I think this was one of the largest blunders in modern Java. I blogged what I thought was a better approach 2 years earlier, and I still think it would have been better. There was plenty of good research that different approaches were better, since at least Oleg Kiselyov’s work in the early noughties. 1/10

Java Time: Much better than what came before, but I have barely had to use much of this API at all, so I’m not in a position to really judge how good this is. Despite knowing how complex time and dates are, I do have a nagging suspicion that surely it doesn’t all need to be this complex? 8/10

Java 9 – 2017

Modules: I still don’t really know what the point of all this was. Enormous upheaval for minimal concrete benefit that I can discern. The general advice seems to be that modules are (should be) an internal detail of the JRE and best ignored in application code (apart from when they spuriously break things). Awful. -10/10 (that’s minus 10!)

jshell: cute! A REPL! Use it sometimes. Took them long enough. 6/10

Java 10 – 2018

The start of time-based releases, and a distinct ramp-up of features from here on, trying to keep up with the kids.

Local type inference (“var”): Some love this, some hate it. I’m definitely in the former camp. 9/10

Java 11 – 2018

New HTTP Client: replaced the old URL.openStream() approach by creating something more like Apache HttpClient. It works for most purposes, but I do find the interface overly verbose. 6/10

This release also added TLS 1.3 support, along with djb-suite crypto algorithms. Yay. 9/10

Java 12 – 2019

Switch expressions: another nice mild quality-of-life improvement. Not world changing, but occasionally nice to have. 6/10

Java 13 – 2019

Text blocks: on the face of it, what’s not to like about multi-line strings? Well, apparently there’s a good reason that injection attacks remain high on the OWASP Top 10, as the JEP introducing this feature seemed intent on getting everyone writing SQL, HTML and JavaScript using string concatenation again. Nearly gave me a heart attack at the time, and still seems like a pointless feature. Text templates (later) are trying to fix this, but seem to be currently in limbo. 3/10

Java 14 – 2020

Pattern matching in instanceof: a little bit of syntactic sugar to avoid an explicit cast. But didn’t we all agree that using instanceof was a bad idea decades ago? I’m really not sure who was doing the cost/benefit analysis on these kinds of features. 4/10

Records: about bloody time! Love ‘em. 10/10

Better error messages for NullPointerExceptions: lovely. 8/10

Java 15 – 2020

Sealed classes: in principal I like these a lot. We’re slowly getting towards a weird implementation of algebraic datatypes. I haven’t used them very much yet so far. 8/10

EdDSA signatures: again, a nice little improvement in the built-in cryptography. Came with a rather serious bug though… 8/10

Java 16 – 2021

Vector (SIMD) API: this will be great when it is finally done, but still baking several years later. ?/10

Java 17 – 2021

Pattern matching switch: another piece of the algebraic datatype puzzle. Seems somehow more acceptable than instanceof, despite being largely the same idea in a better form. 7/10

Java 18 – 2022

UTF-8 by default: Fixed a thousand encoding errors in one fell swoop. 10/10

Java 19 – 2022

Record patterns: an obvious extension, and I think we’re now pretty much there with ADTs? 9/10

Virtual threads: being someone who never really got on with async/callback/promise/reactive stream-based programming in Java, I was really happy to see this feature. I haven’t really had much reason to use them in anger yet, so I don’t know how well they’ve been done. But I’m hopeful! ?/10

Java 21 – 2023

String templates: these are exactly what I asked for in A few programming language features I’d like to see, based on E’s quasi-literal syntax, and they fix the issues I had with text blocks. Unfortunately, the first design had some issues, and so they’ve gone back to the drawing board. Hopefully not for too long. I really wish they’d not released text blocks without this feature. 10/10 (if they ever arrive).

Sequenced collections: a simple addition that adds a common super-type to all collections that have a defined “encounter order”: lists, deques, sorted sets, etc. It defines convenient getFirst() and getLast() methods and a way to iterate items in the defined order or in reverse order. This is a nice unification, and plugs what seems like an obvious gap in the collections types, if perhaps not the most pressing issue? 6/10

Wildcards in patterns: adds the familiar syntax from Haskell and Prolog etc of using _ as a non-capturing wildcard variable in patterns when you don’t care about the value of that part. 6/10

Simplified console applications: Java finally makes simple programs simple for beginners, about a decade after universities stopped teaching Java to beginners… Snark aside, this is a welcome simplification. 8/10

This release also adds support for KEMs, although in the simplest possible form only. Meh. 4/10

Java 22 – 2024

The only significant change in this release is the ability to have statements before a call to super() in a constructor. Fine. 5/10

Java 23 – 2024

Primitive types in patterns: plugs a gap in pattern matching. 7/10

Markdown javadoc comments: Does anyone really care about this? 1/10

Java 24 – 2024

The main feature here from my point of view as a crypto geek is the addition of post-quantum cryptography in the form of the newly standardised ML-KEM and ML-DSA algorithms, and support in TLS.

Java 25 – 2025

Stable values: this is essentially support for lazily-initialised final variables. Lazy initialisation is often trickier than it should be in Java, so this is a welcome addition. Remembering Alice ML, I wonder if there is some overlap between the proposed StableValue and a Future? 7/10?

PEM encoding of cryptographic objects is welcome from my point of view, but someone will need to tell me why this is not just key/cert.getEncoded(“PEM”)? Decoding support is useful though, as that’s a frequent reason I have to grab Bouncy Castle still. 7/10

Well, that brings us pretty much up to date. What do you think? Agree, disagree? Are you a passionate defender of streams or Java modules? Have at it in the comments.
Share this:
Email
Print
Facebook
Reddit
LinkedIn
Like Loading…
12 September, 2025

Java
No, no, no. You’re still not doing REST right!
OK, so you’ve made your JSON-over-HTTP API. Then someone told you that it’s not “really” REST unless it’s hypertext-driven. So now all your responses contain links, and you’re defining mediatypes properly and all that stuff. But I’m here to tell you that you’re still not doing it right. What you’re doing now is just “HYPE”. Now I’ll let you in on the final secret to move from HYPE to REST.

OK, I’m joking here. But there is an aspect of REST that doesn’t seem to ever get discussed despite the endless nitpicking over what is and isn’t really REST. And it’s an odd one, because it’s literally the name: Representational State Transfer. I remember this being quite widely discussed in the early 2000s when REST was catching on, but seems to have fallen by the wayside in favour of discussion of other architectural decisions.

If you’re familiar with OO design, then when you come to design an API you probably think of some service that encapsulates a bunch of state. The service accepts messages (method calls) that manipulate the internal state, from one consistent state to another. That internal state remains hidden and the service just returns bits of it to clients as needed. Clients certainly don’t directly manipulate that state. If you need to perform multiple manipulations then you make multiple requests (multiple method calls).

But the idea of REST is to flip that on its head. If a client wants to update the state, it makes a request to the server, which generates a representation of the state of the resource and sends it to the client. Then client then locally makes whatever changes it wants, and then sends the updated representation back to the server. Think of checking out a file from Git, making changes and then pushing the changes back to the server. (Can you imagine instead having to send individual edit commands to make changes?)

This was a stunning “strange inversion of reasoning” to me at the time, steeped as I was in OO orthodoxy. My first reaction was largely one of horror. But I’d missed the key word “representation” in the description. Returning a representation of the state doesn’t mean it has to directly represent the state as it is stored on the server, it just has to be some logically appropriate representation. And that representation doesn’t have to represent every detail: it can be a summary, or more abstract representation.

Is it a good idea? I’ll leave that for you to decide. I think it makes sense in some cases, not in others. I’m more just interested in how this whole radical aspect of REST never gets mentioned anymore. It suggests to me a much more declarative conception of API design, whereas even the most hypertext-driven APIs I see tend to still have a very imperative flavour. Thoughts?
Share this:
Email
Print
Facebook
Reddit
LinkedIn
Like Loading…
9 July, 2025

API, REST, Web
Streaming public key authenticated encryption with insider auth security
Note: this post will probably only really make sense to cryptography geeks.

In “When a KEM is not enough”, I described how to construct multi-recipient (public key) authenticated encryption. A naïve approach to this is vulnerable to insider forgeries: any recipient can construct a new message (to the same recipients) that appears to come from the original sender. For some applications this is fine, but for many it is not. Consider, for example, using such a scheme to create auth tokens for use at multiple endpoints: A and B. Alice gets an auth token for accessing endpoints A and B and it is encrypted and authenticated using the scheme. The problem is, as soon as Alice presents this auth token to endpoint A, that endpoint (if compromised or malicious) can use it to construct a new auth token to access endpoint B, with any permissions it likes. This is a big problem IMO.

I presented a couple of solutions to this problem in the original blog post. The most straightforward is to sign the entire message, providing non-repudiation. This works, but as I pointed out in “Digital signatures and how to avoid them”, signature schemes have lots of downsides and unintended consequences. So I developed a weaker notion of “insider non-repudiation”, and a scheme that achieves it: we use a compactly-committing symmetric authenticated encryption scheme to encrypt the message body, and then include the authentication tag as additional authenticated data when wrapping the data encryption key for each recipient. This prevents insider forgeries, but without the hammer of full blown outsider non-repudiation, with the problems it brings.

I recently got involved in a discussion on Mastodon about adding authenticated encryption to Age (a topic I’ve previously written about), where abacabadabacaba pointed out that my scheme seems incompatible with streaming encryption and decryption, which is important in Age use-cases as it is often used to encrypt large files. Age supports streaming for unauthenticated encryption, so it would be useful to preserve this for authenticated encryption too. Doing this with signatures is fairly straightforward: just sign each “chunk” individually. A subtlety is that you also need to sign a chunk counter and “last chunk” bit to prevent reordering and truncation, but as abacabadabacaba points out these bits are already in Age, so its not too hard. But can you do the same without signatures? Yes, you can, and efficiently too. In this post I’ll show how.
(more…)
Share this:
Email
Print
Facebook
Reddit
LinkedIn
Like Loading…
2 July, 2025

authenticated encryption, cryptography, public key encryption, streaming encryption
Are we overthinking post-quantum cryptography?
tl;dr: yes, contra thingamajig’s law of wotsits.

Before the final nail has even been hammered on the coffin of AI, I hear the next big marketing wave is “quantum”. Quantum computing promises to speed up various useful calculations, but is also potentially catastrophic to widely-deployed public key cryptography. Shor’s algorithm for a quantum computer, if realised, will break the hard problems underlying RSA, Diffie-Hellman, and Elliptic Curve cryptography—i.e., most crypto used for TLS, SSH and so on. Although “cryptographically-relevant” quantum computers (CRQCs) still seem a long way off (optimistic roadmap announcements and re-runs of previously announced “breakthroughs” notwithstanding), for some applications the risk is already real. In particular, if you are worried about nation-states or those with deep pockets, the threat of “store-now, decrypt-later” attacks must be considered. It is therefore sensible to start thinking about deploying some form of post-quantum cryptography that protects against these threats. But what, exactly?
(more…)
Share this:
Email
Print
Facebook
Reddit
LinkedIn
Like Loading…
20 June, 2025

cryptography, cybersecurity, jwt, oauth, post-quantum cryptography, public key encryption, quantum-computing, Security, technology
A look at CloudFlare’s AI-coded OAuth library
I decided today to take a look at CloudFlare’s new OAuth provider library, which they apparently coded almost entirely with Anthropic’s Claude LLM:

This library (including the schema documentation) was largely written with the help of Claude, the AI model by Anthropic. Claude’s output was thoroughly reviewed by Cloudflare engineers with careful attention paid to security and compliance with standards. Many improvements were made on the initial output, mostly again by prompting Claude (and reviewing the results). Check out the commit history to see how Claude was prompted and what code it produced.

[…]

To emphasize, this is not “vibe coded”. Every line was thoroughly reviewed and cross-referenced with relevant RFCs, by security experts with previous experience with those RFCs. I was trying to validate my skepticism. I ended up proving myself wrong.

I have done a fair amount of LLM-assisted “agentic” coding of this sort recently myself. I’m also an expert in OAuth, having written API Security in Action, been on the OAuth Working Group at the IETF for years, and previously been the tech lead and then security architect for a leading OAuth provider. (I also have a PhD in AI from an intelligent agents group, but that predates the current machine learning craze). So I was super interested to see what it had produced, so I took a look while sitting in some meetings today. Disclaimer: I’ve only had a brief look and raised a few bugs, not given it a full review.

Initially, I was fairly impressed by the code. The code is all in one file, which is common from my experience from LLM coding, but it’s fairly well structured without too many of the useless comments that LLMs love to sprinkle over a codebase, and some actual classes and higher-level organisation.

There are some tests, and they are OK, but they are woefully inadequate for what I would expect of a critical auth service. Testing every MUST and MUST NOT in the spec is a bare minimum, not to mention as many abuse cases as you can think of, but none of that is here from what I can see: just basic functionality tests. (From a cursory look at the code, I’d say there are probably quite a few missing MUST checks, particularly around validating parameters, which is pretty light in the current implementation).

The first thing that stuck out for me was what I like to call “YOLO CORS”, and is not that unusual to see: setting CORS headers that effectively disable the same origin policy almost entirely for all origins:
```
private addCorsHeaders(response: Response, request: Request): Response {
    // Get the Origin header from the request
    const origin = request.headers.get('Origin');

    // If there's no Origin header, return the original response
    if (!origin) {
      return response;
    }

    // Create a new response that copies all properties from the original response
    // This makes the response mutable so we can modify its headers
    const newResponse = new Response(response.body, response);

    // Add CORS headers
    newResponse.headers.set('Access-Control-Allow-Origin', origin);
    newResponse.headers.set('Access-Control-Allow-Methods', '*');
    // Include Authorization explicitly since it's not included in * for security reasons
    newResponse.headers.set('Access-Control-Allow-Headers', 'Authorization, *');
    newResponse.headers.set('Access-Control-Max-Age', '86400'); // 24 hours

    return newResponse;
  }
```
There are cases where this kind of thing is OK, and I haven’t looked in detail at why they’ve done this, but it looks really suspicious to me. You should almost never need to do this. In this case, the commit log reveals that it was the humans that decided on this approach, not the LLM. They haven’t enabled credentials at least, so the sorts of problems this usually results in probably don’t apply.

Talking of headers, there is a distinct lack of standard security headers in the responses produced. Many of these don’t apply to APIs, but some do (and often in surprising ways). For example, in my book I show how to exploit an XSS vulnerability against a JSON API: just because you’re returning well-formed JSON doesn’t mean that’s how a browser will interpret it. I’m not familiar with CloudFlare Workers, so maybe it adds some of these for you, but I’d expect at least an X-Content-Type-Options: nosniff header and HTTP Strict Transport Security to protect the bearer tokens being used.

There are some odd choices in the code, and things that lead me to believe that the people involved are not actually familiar with the OAuth specs at all. For example, this commit adds support for public clients, but does so by implementing the deprecated “implicit” grant (removed in OAuth 2.1). This is absolutely not needed to support public clients, especially when the rest of the code implements PKCE and relaxes CORS anyway. The commit message suggests that they didn’t know what was needed to support public clients and so asked Claude and it suggested the implicit grant. The implicit grant is hidden behind a feature flag, but that flag is only checked in an entirely optional helper method for parsing the request, not at the point of token issuance.

Another hint that this is not written by people familiar with OAuth is that they have implemented Basic auth support incorrectly. This is a classic bug in OAuth provider implementations because people (and LLMs, apparently) assume that it is just vanilla Basic auth, but OAuth adds a twist of URL-encoding everything first (because charsets are a mess). Likewise, the code has a secondary bug if you have a colon in the client secret (allowed by the spec). I don’t think either of these are issues for this specific implementation, because it always generates client IDs and secrets and so can control the format, but I haven’t looked in detail.

A more serious bug is that the code that generates token IDs is not sound: it generates biased output. This is a classic bug when people naively try to generate random strings, and the LLM spat it out in the very first commit as far as I can see. I don’t think it’s exploitable: it reduces the entropy of the tokens, but not far enough to be brute-forceable. But it somewhat gives the lie to the idea that experienced security professionals reviewed every line of AI-generated code. If they did and they missed this, then they were way too trusting of the LLM’s competence. (I don’t think they did: according to the commit history, there were 21 commits directly to main on the first day from one developer, no sign of any code review at all).

I had a brief look at the encryption implementation for the token store. I mostly like the design! It’s quite smart. From the commit messages, we can see that the design came from the human engineers, but I was quite impressed by the implementation. It’s worth reproducing the commit message from this work here, which shows the engineer’s interactions with Claude to get the desired code implemented:

Ask Claude to store the props encrypted.

prompt: I would like to encrypt the `props` stored in `Grant` and `Token` records. It should be encrypted such that you need a valid token to decrypt. This is a bit tricky since there are multiple valid tokens over time: there’s the authorization code, the refresh tokens (which rotate), and individual access tokens. We don’t want to repeatedly re-encrypt `props`. Instead, we should encrypt in once, with a symmetric key, and then we should store that key “wrapped” for each token, while the token is valid. Please use WebCrypto to implement all cryptography.

Claude started on the wrong track making me realize I forgot an important design consideration:

prompt: One thing I forgot to note: The `listUserGrants()` helper function will no longer be able to return the `props`, since it doesn’t have any token with which to decript it. That’s OK: `props` need only be delivered to the app upon an authorized API request. We should actually change `listUserGrants()` to make it return a narrower representation of a grant. Right now it returns the entire grant record from storage, but we really only need it to return `id`, `clientId`, `userId`, `scope`, `metadata`, and `createdAt`. We don’t need to return any of the token IDs or code challenge information.

Claude produced beautiful code with one big flaw.

prompt: There’s a security flaw in the way you wrap keys for tokens: You used a SHA-256 hash of the token as the key material for the wrapping. However, SHA-256 is also how we compute “token IDs”. With this construction, someone would be able to unwrap the keys using only the token ID, which is stored alongside the wrapped keys, hence all keys can be trivially unwrapped. To fix this, we need to compute the hash differently when computing the key material for wrapping, in such a way that it’s not possible to derive the key material from the token ID.

Claude initially tried to solve this by switching to using PBKDF2 with 100,000 iterations to derive the key material.

prompt: PDKDF2 with 100000 iterations would be very expensive. This would be important if the input were a low-entropy password, but is not necessary for high-entropy input. Instead of PBKDF2, let’s use a SHA-256 HMAC, with a static HMAC key (which essentially acts as the “salt”).

Claude produced code that used a string “OAUTH_PROVIDER_WRAPPING_KEY_HMAC_v1” as the HMAC key.

prompt: This looks pretty good, but for performance, let’s define WRAPPING_KEY_HMAC_KEY as a 32-byte array, so that it doesn’t have to be encoded or hashed down to the right size (as HMAC would do for larger keys). Here are 32 bytes of hex which I have chosen randomly, to use as the HMAC key: 22 7e 26 86 8d f1 e1 6d 80 70 ea 17 97 5b 47 a6 82 18 fa 87 28 ae de 85 b5 1d 4a d9 96 ca ca 43

(NB: using a hard-coded “key” here is fine: it’s essentially HKDF-Extract with a fixed random salt, which is fine and dandy for this use-case. The security property we’re looking for here is that the two uses are independent random oracles, for which this is a decent design. I would maybe use the same approach for generating the token ID too, with a different salt, but that’s a minor tweak).

What this interaction shows is how much knowledge you need to bring when you interact with an LLM. The “one big flaw” Claude produced in the middle would probably not have been spotted by someone less experienced with crypto code than this engineer obviously is. And likewise, many people would probably not have questioned the weird choice to move to PBKDF2 as a response: LLMs really do not “reason” in any real way.

Closing Thoughts

As a first cut of an OAuth library, it’s not bad, but I wouldn’t really recommend it for use yet. In my experience, it is very hard to build a correct and secure OAuth provider implementation, and it deserves way more time and attention than has clearly gone into this one (yet). IMO, it’s not an appropriate domain for testing out an LLM. At ForgeRock, we had hundreds of security bugs in our OAuth implementation, and that was despite having 100s of thousands of automated tests run on every commit, threat modelling, top-flight SAST/DAST, and extremely careful security review by experts. The idea that you can get an LLM to knock one up for you is not serious.

The commit history of this project is absolutely fascinating. The engineers clearly had a good idea of many aspects of the design, and the LLM was tightly controlled and produced decent code. (LLMs are absolutely good at coding in this manner). But it still tried to do some stupid stuff, some of which were caught by the engineers, some were not. I’m sure some are still in there. Is this worse than if a human had done it? Probably not. Many of these same mistakes can be found in popular Stack Overflow answers, which is probably where Claude learnt them from too. But I know many engineers who would have done a better job, because they are extremely diligent. Code like this needs careful attention. Details matter. Yes, this does come across as a bit “vibe-coded”, despite what the README says, but so does a lot of code I see written by humans. LLM or not, we have to give a shit.

What I am taking away from my experience with LLMs, and from reviewing this project is this: you need to have a clear idea in your head of the kind of code you’re expecting the LLM to produce to be able to judge whether it did a good job. Often, to really know what that looks like, and engage your “System 2” thinking (so you’re not just accepting what’s in front of you as the best way to do things), you need to have built one yourself first. For trivial things where I don’t really care how it’s done, then sure, I’m happy to let an LLM do whatever it likes. But for important things, like my fucking auth system, I’d much rather do it myself and be sure that I really thought about it.
Share this:
Email
Print
Facebook
Reddit
LinkedIn
Like Loading…
6 June, 2025

API security, artificial intelligence, LLMs, oauth, Security
The square roots of all evil
Every programmer knows Donald Knuth’s famous quote that “premature optimization is the root of all evil”, from his 1974 Turing Award lecture (pdf). A fuller quotation of the surrounding context gives a rounder view:

I am sorry to say that many people nowadays are condemning program efficiency, telling us that it is in bad taste. The reason for this is that we are now experiencing a reaction from the time when efficiency was the only reputable criterion of goodness, and programmers in the past have tended to be so preoccupied with efficiency that they have produced needlessly complicated code; the result of this unnecessary complexity has been that net efficiency has gone down, due to difficulties of debugging and maintenance.

The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.

A lot has been written about this topic and exactly what constitutes “premature” optimisation. I don’t want to rehash those debates, but instead to rephrase them. It is not premature optimisation as such that the root of all evil, but rather premature specialisation.

Optimisation almost always involves making some assumptions about the context in which a program will run. At a micro level, concerns about CPU architecture, memory hierarchy, caches, what features have hardware acceleration, etc. At a macro level, even choice of big-O algorithm may require assumptions to be made about data: e.g., these data points are all integers in a certain range so I can sort them in linear time: but will that always be true? Maybe I need to dynamically pick which code to use based on what environment I’m running in, so that I can run the most efficient code for that CPU/etc. These assumptions all lead to special-case code, tailored to that context. It is that specialisation that leads to complexity and inflexibility, rather than optimisation per se.

Programming is frequently accidentally quadratic, and we were all taught in school that quadratic equations can have two roots. So it is with evil in programming, and I would like to propose that there is another root of all evil that deserves to be called out: premature generalisation.

Premature generalisation is the flip side of premature specialisation, but can be just as dangerous to the health of a codebase. We have all laughed at Enterprise FizzBuzz jokes, with their FactoryBuilderFactoryStrategies and so on. And I’m sure many of us have seen codebases affected by such monstrosities, where an “enterprise architect” applied a Pokémon “gotta catch ‘em all” approach to Martin Fowler’s blog. But this is just an extreme (although sadly common) example.

Examples of premature generalisation are everywhere. As a security and cryptography engineer, I see the same mindset in tools like PGP or JWTs (JOSE): overly-complex and easy to screw up footguns because they try to cover too many use-cases in a single tool. The reaction has been the development of special-purpose tools like Age or minisign, that follow the Unix philosophy of “do one thing well”.

On a perhaps more controversial level, I would also point to the (over-)use of category theory as an example of premature generalisation. Sure, maybe your widget could be a Monad, but does that actually solve a concrete problem you have right now? Might you want to change the implementation in future in ways that make it harder to maintain the Monad contract?

When I was a PhD student, I was interested in logical approaches to AI. The big deal at the time was the semantic web and description logics. As part of my work I was developing an ontology of events in online games to describe what was going on in those environments. I found myself endlessly wanting to generalise and unify concepts as much as possible. I wasted a lot of time trying to develop a perfect “upper level ontology” as these things were called then. Eventually my adviser gently nudged me to solve the problem I have in front of me.

That is probably the biggest lesson I have learned in programming, and the only way I know to try to walk the fine line between premature specialisation and premature generalisation. Solve the problem in front of you. Don’t try and solve a more general version of the problem, and don’t try and “optimise” until you are fairly sure something is going to be a performance issue.

This is not to say you should just live in the moment and write code with no thought at all beyond that specific problem. If you are writing code to handle a FooWidget and you know there are 200 other types of widget you’re going to have to code up later, it’s not premature to think about how those might share functionality. If you are designing a programming language or database, it is not premature to make it very general (and please do take a look at category theory and other unifying frameworks). If you know you’ll have to process millions of FooWidgets per hour then it’s not premature to think about efficiency. And likewise, if it’s extremely unlikely that you’re ever going to move off PostgreSQL as your database then it is not unreasonable to use database-specific features if it makes other code simpler.

There’s a lot more that could be said on this topic, including trade-offs between individual tool/component complexity vs whole-system/ecosystem complexity, and accidental vs necessary complexity. But then I’d be making this article more complex. Better to leave that for other articles and leave this one just right.

Goldilocks runs from the bears of premature generalisation and specialisation
Share this:
Email
Print
Facebook
Reddit
LinkedIn
Like Loading…
Fediverse Reactions
3 December, 2024

complexity, programming, software engineering
Digital signatures and how to avoid them
Wikipedia’s definition of a digital signature is:

A digital signature is a mathematical scheme for verifying the authenticity of digital messages or documents. A valid digital signature on a message gives a recipient confidence that the message came from a sender known to the recipient.
—Wikipedia

They also have a handy diagram of the process by which digital signatures are created and verified:

Source: https://commons.m.wikimedia.org/wiki/File:Private_key_signing.svg#mw-jump-to-license (CC-BY-SA)

Alice signs a message using her private key and Bob can then verify that the message came from Alice, and hasn’t been tampered with, using her public key. This all seems straightforward and uncomplicated and is probably most developers’ view of what signatures are for and how they should be used. This has led to the widespread use of signatures for all kinds of things: validating software updates, authenticating SSL connections, and so on.

But cryptographers have a different way of looking at digital signatures that has some surprising aspects. This more advanced way of thinking about digital signatures can tell us a lot about what are appropriate, and inappropriate, use-cases.
(more…)
Share this:
Email
Print
Facebook
Reddit
LinkedIn
Like Loading…
18 September, 2024

authenticated encryption, cryptography, misuse resistance, signatures

Share this:

Class hierarchies and mutability

Share this:

Share this:

Java 2 – 1998

Java 1.4 – 2002

Java 5 – 2004

Java 7 – 2011

Java 8 – 2014

Java 9 – 2017

Java 10 – 2018

Java 11 – 2018

Java 12 – 2019

Java 13 – 2019

Java 14 – 2020

Java 15 – 2020

Java 16 – 2021

Java 17 – 2021

Java 18 – 2022

Java 19 – 2022

Java 21 – 2023

Java 22 – 2024

Java 23 – 2024

Java 24 – 2024

Java 25 – 2025

Share this:

Share this:

Share this:

Share this:

Closing Thoughts

Share this:

Share this:

Fediverse Reactions

Share this: