Speeding up random row selection in Postgres by 8x with F# and Entity Framework

Date: 2024-05-08 | create | tech | fullstack-projects | fsharp | entity-framework | postgres |

I'm currently building FullstackProjects.xyz - a site that generates fullstack project ideas. Its initial version is a simple MVE providing randomized ideas to look through.

These ideas are stored in a Postgres DB so each time I need to provide new ideas I need to query the database. My initial attempt at this led to queries taking ~800ms which is much too slow for such a simple, core feature.

In this post we'll walk through how I optimized these random row selection queries in Postgres to achieve an 8x improvement (800ms -> 100ms) with the same UX and architecture.

FullstackProjects Overview

FullstackProjects - Screenshot

Fullstack Projects is a simple SSR website hosted on a modulith. My initial version utilized in-code lists as a datastore which was very fast but I've since moved to a database to faciliate further evolution of the app.

FullstackProjects - System Design

This means I'm now doing frequent retrievals of randomized items from a moderately sized SQL database table. When done naively (as we'll see) this can lead to very poor performance.

In my case this was taking 800ms for each query which was much too slow for this ux and likely not scalable for more users.

This system architecture is how I start most of my apps. I call it The HAM Stack and have built an F# project boilerplate (CloudSeed) to make it easy to get started.

Naive Randomized Queries

I'm using Entity Framework as an ORM for my F# app which allows me to query my database using F# - without writing raw sql. This can be both a good and bad thing depending on the situation.

For my initial version I did the dumbest thing I could think of (and that my AI recommended) which was to simply order my datastore by a random value (in this case a Guid) and take the top x items I wanted.

let getRandomProjectIdeas 
    (serviceTree: FullstackProjectsServiceTree)
    (count: int)
    : Async<ProjectIdea List>
    =
    async {
        use dbContext = serviceTree.DbContext()

        let! results = 
            dbContext
                .ProjectIdea
                .OrderBy(fun e -> Guid.NewGuid())
                .Take(count)
                .ToListAsync()
            |> Async.AwaitTask
        
        return results
    }

This actually works pretty well.

  • ~Real random
  • Simple to write and understand
  • Uses EF - no raw sql

The downside of course is that these queries were very slow - taking ~800ms to populate on the page.

Postgres - Slow Random Queries

The likely reason for this slowness was that we were ordering the ENTIRE table just to pull a few rows out. Worse - we're ordering on new, randomized values that the db doesn't know anything about, thus it's hard for it to do optimizations based on heuristics / indexes cause it doesn't know what the values will be beforehand.

Even if we were doing this in-memory in something like a list, it's clear this would be slow - giving us O(nlogn) to sort at best which is expensive for a common, hot path.

8x Faster Randomized Queries

800 ms was too slow for a frequent ux flow like this - ideally it would be <250 ms to feel snappy. So I started searching around for alternatives that worked well and were simple to implement.

The top answer seemed to be to use Postgres' TABLESAMPLE BERNOULLI which would allow a much faster retrieval of some % of rows.

SELECT * 
FROM your_table_name 
TABLESAMPLE BERNOULLI(5)
LIMIT n;

This is probably the bulletproof option for prod and is what I would choose if we were running at scale and this was a bottleneck. But we're not there yet and I wanted to stay inside EF as much as was reasonable early on so looked for other options.

A naive option popped up to simply take a chunk of items using randomized indexes with Skip and Take. This has the downside of not giving full random (we're taking random chunks but not random items) but the upside that it's simple, fast, and can be done in EF. For my cases we don't care about full random and it's unlikely anyone would see / notice chunks vs items so this was appealing.

let getRandomProjectIdeas 
    (serviceTree: FullstackProjectsServiceTree)
    (count: int)
    : Async<ProjectIdea List>
    =
    async {
        use dbContext = serviceTree.DbContext()

        let! dbCount = 
            dbContext 
                .ProjectIdea
                .CountAsync()
            |> Async.AwaitTask

        let randomIndexStart = 
            Math.Max(random.Next(dbCount - count), 0)

        let! results =
            dbContext
                .ProjectIdea
                .Skip(randomIndexStart)
                .Take(count)
                .ToListAsync()
            |> Async.AwaitTask
        
        return results
    }

Running this several times in prod I got ~100 ms returns for essentially the same UX so this is the approach I went with.

Postgres - Faster Random Queries

Next

I love building little side projects cause you run into all sorts of tiny puzzles you haven't thought of before. I'm planning on playing with a few more features for FullstackProjects so stay tuned for that.

Q: How do you query random items from your datastore?

If you liked this post you might also like:

Want more like this?

The best / easiest way to support my work is by subscribing for future updates and sharing with your network.