GopherCon UK 2024

Featured image for sharing metadata for article

It was my first GopherCon, and I had a great time. There were some great talks, great people, great food, some great swag and a very perfectly chosen hotel!

I'd wanted to go last year, but it was my first week at Elastic, and I don't think it would've been a good look 🫣

You'll notice that this blog post isn't as heavy as some of my previous conference writeups - in recent conferences (of which some still have writeups to publish 🫣) I've started taking fewer transcriptions and started focussing more on the talk itself, with the idea that I can listen back to the talks, but in the moment I should take notes for memory of key points.

I'm also very glad to have had this published on the same day that the conference ends 👏🏼 but I may end up coming back to add things I'd forgotten.

It was noted that it was 10 years since the first GopherCon UK, but given there was no GopherCon UK for one year due to COVID, it was the 9th, leading to a fun problem of "is it the 10th", and I found it funny thinking of it either being an "off by one error" or just that we're counting 0-indexed.

The Business of Go

Cameron Balahan took us through Google's investment in Go, and why it continues to be the right thing to do as a business.

Cameron shared a key responsibility of being a Product Manager:

Deliver the right experience to the right market at the right time

Throughout the talk, Cameron took us through how the evolution of Go over the last few years has changed based on the maturity lifecycle the language is in.

We heard about Go being a lot of things, but primarily focussing on building services for the Cloud, and that it's generally built for web (HTTP/RPC) services. Cameron shared about some of the risks of trying to appeal too broadly, leading to incompatible constraints across different use-cases, versus focussing on a narrow set of requirements, leading to unnecessary fragmentation.

Cameron shared that Go has two stakeholders:

  • Go's users, who want to be able to productively build production-grade web services
  • Google, who want a return on investment, both for their heavy internal use of Go, as well as for external users to continue to use Google tooling and services

It was a nice, surprisingly honest, explanation, and led onto how Google naturally keep it in their interest to keep Google central to the language and ecosystem, but naturally it is a BSD-3-Clause Open Source project, so can be forked if needed.

Cameron shared that Go is growing in a great way (slower than Rust) and in a way that's not just "more folks learning", but that folks are actively seeking out learning Go.

I enjoyed how Google-themed the slides, naturally, were, with slides using an "Ask Google" prompt, or asking Gemini questions.

Debugging Go Applications: From local headaches to production puzzles

Matt Boyle took us through a worked example of trying to debug a local application for reserving hotel rooms, and is one that would be good to watch when the recording is out.

Matt started with the example of "if you have time, try and debug by eye and see if you can see (by eye) what's wrong".

From there, Matt explained how Go's Delve (aka dlv) is excellent, and can be used to dig into issues.

Matt noted that ideally, you'd be trying to reproduce an issue through tests, both to ensure that you can reproduce it consistently, as well as giving you a regression test from the future. However, that wasn't the point of the talk.

From debugging directly, we looked at adding logs using slog, and adding some useful metadata to logs.

From here, Matt slurped the logs up from a locally running Filebeat to a local instance of Elasticsearch and Kibana, where we could then filter the logs better.

Matt shared that a good place to start is by logging errors, or exceptional cases i.e. at ERROR or WARN. Then, everything else can go into DEBUG, which is off by default and can be enabled as-and-when it's needed via environment variables.

As is a usual thing that comes from me, I'd recommend being very cautious around what goes into your DEBUG log levels, as there are some dangerous things that really should never be in there!

Matt shared a good way of preparing lots of data for debugging, which is using a "seeder" command for this given test, which tries to simulate the example by constantly re-running the same case, and allowing getting logs at a significant rate to be able to start understanding the issue.

We then heard about metrics and how they should be used for things that can't be logged, or don't make sense to be, and how they can be really useful for an aggregate such as "how many 4xx errors have I received in the last 5m".

An aside was that Matt noted about sharing the logger through the context.Context, which was an interesting thing I've not seen before. Matt noted that it was something Google do, and that it makes it less onerous to pass around a *slog.Logger or similar all over the codebase, which I have to agree with!

As part of this, Matt shared a common httpError method that's used across the codebases Matt works on to provide a central place to set the response's status code as well as increment a metric.

Matt recommended monitoring for "RED":

  • Rate: are there too many requests/second?
  • Error: are there too many 4xx / 5xx?
  • Duration: are requests taking too long?

And as part of this, you should be considering your Service Level Objectives (SLOs) and what commitments you have to your users.

As ever, it's important to remember to make sure alerts are actionable, and prompt an action. If they don't, they should be deleted, as snoozing or deferring alerts are going to lead to alert fatigue, which isn't a place your team want to be.

Matt also noted taking external factors into account, for instance what happens if an External API you depend on starts slowing down? What if their expected processing time is 10s, but they usually respond in 0.5s? Are you able to handle that? Matt recommended looking into what is actually slow, instrumenting what you can.

We then started looking into the use of spans, namely using Open Telemetry, for instance adding an event to say "I have now chosen this hotel", which then allows looking in aggregate around whether i.e. a specific hotel always takes a bit longer than the others.

Matt recommended the use of standard tracing IDs (which I agree with) so you can see end-to-end tracking across the system, especially once configured in i.e. your Open Telemetry Configuration.

The journey we went through was a set of good incremental steps, and Matt noted that at each of these stages, you could stop where you are, having already unlocked a good amount of value since the last option:

inspect the code -> logging -> delve (local debugger) -> slog -> metrics -> tracing

Finally Matt recommended that we think of the cost of tools like DataDog compared to the cost to the outage we would have if we didn't have the observability into our systems.

The talk also reminded me that I really need to set up the ability to attach a debugger in Neovim, likely using leoluz/nvim-dap-go.

Load testing distributed web services

George Malamidis shared how loveholidays load test their production services, as a Europe-based travel site, with variable seasonal load and fortunately not having to deal with much non-Europe traffic.

As part of this George shared about how well their systems could scale, before their customers find out at an inopportune time.

We heard about the way that the team use their home-grown performance testing tool, ripley to take HTTP access logs from their internal Grafana Loki setup and replay them against their -dev and -prod infrastructure.

George shared that the tools on the market for performance testing didn't quite fit their needs - needing to follow customer interactions more closely - and so they ended up needing to write their own.

Internally, they have "Owlbot", which "hunts at night" to perform performance tests in the quiet hours and see if the platform responds appropriately with the load. A nice thing about this is that, given it's very low traffic usually overnight, the on-call pager is disabled overnight, allowing folks to stay rested, and not need to deal with anything between 2300-0700, instead picking up on any issues flagged by Owlbot, or by real user interactions, until a more reasonable time of day.

George shared some good "stakeholder clickbait" with regular graphs around how much it costs to serve 1000 sessions, and how the infrastructure costs were more cost effective the more traffic received.

George took us through some profiling they did on ripley, as they found that it was always a bit slower than the traffic from customers. It wasn't related to the JSON (un)marshalling, nor was it due to settings with the HTTP client, it was actually additional code they performed between the HTTP requests to i.e. count how long requests took, that then cumulated to a considerable drift.

They tweaked the algorithm they were using to compensate for the drift introduced, and that ended up making the "time taken" make sense again.

As could be expected, it was noted that testing in production didn't work nearly as nicely as in development. A few things were at fault, but largely it was the variable request/response time in production, and that ripley was expecting responses faster than it was actually receiving them.

Through more profiling, they discovered that DefaultMaxConnPerHost was also needing some tuning to help unblock this performance issue, and then they started to be able to process traffic up to 80x the original traffic load, which is considerable!

In the Q&A after the talk, it was noted that the performance testing was only for stateless requests, and that it also didn't involve anything related to PII, which naturally would need to be stripped from logs, as well as anything else that could be dangerous. As well as this, they performed an allowlist of website routes that would be performance tested, so as to make sure that only internal APIs were exercised by this, saving external suppliers from being unexpectedly load tested!

George shared that there is a write-up on the loveholidays blog for some similar information.

Embracing complexity - Entropy in software design

Shivam Acharya and Peter Chai were a great duo, and very surprising that it's their first conference talk!

They talked about the process of starting the Aviva Zero greenfield insurance platform, and some of the decisions they made around the way that they modelled insurance policies, and internal libraries they built around this.

As part of this, policies were modelled as immutable, with a structured means to modify them (which would produce a new policy), and that it would also need to abide by controlled change management, due to them being in a regulated industry.

One of the things was noted that they had an internal library that performed the application of policy adjustments to a policy:

// NOTE that PolicyAdjuster is my name here
type PolicyAdjuster interface {
    Validate(adjustments []Adjustment) error
    ApplyAdjustments(policy *Policy, adjustments []Adjustment) error
}

In it, they noted that the API they'd produced, as with all APIs, was just a suggestion of how to use it.

In this case, the Validate method could simply not be called by a consumer, and the Process method wouldn't fail in as nice a way if it is called without a call to Validate.

As part of follow-up work they refactored their interface to remove the Validate method, and make it implied as part of the call to ApplyAdjustments:

// NOTE that PolicyAdjuster is my name here
type PolicyAdjuster interface {
    ApplyAdjustments(policy *Policy, adjustments []Adjustment) error
}

This ended up breaking the usage model for some tests where engineers didn't necessarily want a fully semantically valid adjustment, just something that worked. But it was something that made sense to break a few cases of non-production code, as the production usages needed to be more strict.

This reduced the uncertainty around the contract between the user and producer of the PolicyAdjuster, as the user of the interface doesn't need to understand if they need to call Validate, as it's now implied in the implementation of ApplyAdjustments.

This moved the complexity from the interface for consumers to it now being something that the implementation needs to be concerned with.

As part of this, Shivam and Peter shared how complexity is always somewhere, and it's always worth being very considered as to where you want to push it.

There was discussion around entropy and the way that it naturally tends towards a maximum, and how as changes to a system occur (i.e. a library), the entropy increases, and with it, the increased probability of "something you don't want to happen" happening.

We heard about how complex != complicated, and a mention of the Cynefin framework.

They finally talked about how they made a few decisions that had positives and implicit trade-offs, such as moving from a library, which needed to be updated everywhere (i.e. with Renovate) as soon as updates were published, to instead being managed through a central API, which then leads to needing to manage infrastructure and introduce a network hop for each interaction, but that ensures logic can be updated atomically.

Shivam and Peter reiterated that complexity isn't a zero sum game - if it's not in the implementation, or the interface consumers use, then it's elsewhere, i.e. in your dependencies, or using syntactic sugar.

By accepting there is entropy, and taking charge of it, that will hopefully reduce the risk of it being a problem. By embracing increased implementation complexity, you can make it easier to interface with, and that also makes it clearer in your implementation that you're trying to produce a straightforward interface.

AI is coming for your job

Adam Scholey was a great speaker, and this was a talk that's worth watching, rather than me trying to recount, and missing the good delivery and content.

Event Driven Workflows

Andrew Wormald shared a library that Luno are working on, called workflow, to make it easier to manage event-driven workflows at the company.

Andrew shared the difference between an Event Driven Architecture (EDA) and Workflows, and some of the implementation decisions made to make the library work for their usages.

This notably included an interesting choice to be "technology agnostic", where you could use it with Kafka or Redux, or Postgres or MySQL, leading to not being able to i.e. wrap all interactions in a single database transaction, and having to leave a number of things to the lowest common denominator.

The Key to Go Efficiency is Just a Few Go Runtime Metrics Away!

Bartłomiej Płotka and Arianna Vespri shared some interesting ways you can profile your application, most notably using the Prometheus Go client.

Using Pact to Deploy Microservice with confidence

move fast, don't break things

Mark Bradley shared a funny and informative live demo of using Pact to introduce consumer-driven contract tests.

Aside from an unfortunate section towards the end of demos where we were no longer able to deploy our service, it was a great runthrough of how Pact can be used.

Mark noted how testing your OpenAPI spec is implemented (i.e. via Dredd) is "like marking your own homework", which I kinda agree with. But I also disagree with this take, as it gives very strong confidence in your implementation being correct as to what you had intended with your designed contract.

There is room for both sets of testing, and consumer-driven contract tests are a great way to provide the fullest confidence in your implementation.

I will say I wasn't aware that it wasn't recommended to use shared client libraries, and to instead have each client define their own implementation (i.e. to say "I only use these fields from the request/response", which I'm not sure I particularly like, especially if you can avoid some of the boilerplate by auto-generating structs.

Consistency Catalyst: The story of Paddle's in-house microservice toolkit

George Wilson shared a two-year effort from Paddle to completely rebuild their core platform, and some of the things they've learned along the ways.

It was interesting to hear how Paddle had gone from a "true" microservices approach, where teams were able to independently choose their tech, make decisions around their API design, whether to use RESTful API design or gRPC, as well as even choices around how they wanted to structure their repos.

George noted that the company were very cautious around being too consistent, and at times were a little "scared to upset people" with becoming too standardised, trying to avoid any shared design thinking.

However, this led to a number of inconsistencies around API choices, databases in use, approaches to infrastructure, and eventually became a bit too much. George noted that at the point onboarding new hires required speaking to specific engineers for "why did we design it this way", it highlighted the "lottery factor", and it was becoming clear that the cost of software maintenance was increasing, alongside system rot.

This has meant that what used to be a very inconsistent state in ~2020 is now a lot more cookie-cutter and consistent, and allows engineers to much more easily contribute to other repos, due to the very similar structure, coding style and building blocks in the services.

They found that this was also felt in the external developer experience, where their API clearly felt like it had been built by multiple teams, and like a different suite of APIs rather than a coherent, consistent experience.

The TL;DR is that they managed to pull off producing a "golden path" that teams were eager to get started with and use, and they've had very few cases of teams wanting to walk a different path. By working to solve some issues centrally, teams are now empowered to focus on the right problems to solve while working inside a great set of guardrails.

Something interesting was that they found that the initial attempts at working on style guides, or service frameworks, without a bit more of the tooling to support it, didn't end up working. Whereas it can "be cheap" to push something top-down, you need to back it up with support for teams, be it tooling or training.

Thus the AppEx team was born with one of their key goals being to "[try] to prevent teams from re-inventing the wheel", and to empower teams to decide whether the consistent components would benefit them.

They worked to produce a number of shared libraries, such as:

  • go-scope (OAuth2 scopes)
  • go-auth (middleware validate/parse, implementing permission checks)
  • go-paginator (client-and-server for API/SQL)

Alongside this, internal frameworks and a service template produced a consistent framework for building web apps in a way that engineers didn't need to focus too much on trying to fit their API contract, but that instead the framework made it easy to ✨ just work ✨

I very much gelled with this talk, especially around some of the things the internal Application Experience (AppEx) team worked on and some of the learnings they've had.

We saw this a lot at Deliveroo with some of the things we wanted to do as the Go team, in particularly in the lead up to the soft mandate (oxymoron?) that teams should be doing new development in Go, not existing Rails apps. As part of this, having a golden path was key to getting teams up-and-running and upskilled.

I've also seen this in my now official role in Engineering Productivity at Elastic, and how we try and produce tools that work for teams, but that they are able to not use if they so wish.

One aside I found interesting was the usage of BDD (via Godog) for their API tests, which I'm not a fan of.

Something that's worked well with the work at Paddle is that they've listened to what teams are doing, what debates they're having, and trying to solve that centrally, as well as following what patterns emerge, rather than going after what they think is the right thing to do.

A quite mature viewpoint was that Paddle don't want their engineers to do "CV driven development", trying to introduce new architecture or newfangled technologies that aren't necessarily great for the business, but could help someone improve their skills. George noted that there absolutely is need to try new things, but they try and scope them to hack days or prototypes, allowing engineers to explore without it becoming core architecture. Instead, George notes that Paddle works to engage engineers in decisions closer to the product team which helps keep a little bit of boredom at bay which is a common cause for engineers to be exploring exciting new projects at work.

There was an interesting discussion in the Q&A around whether to focus on OSS-only frameworks and tooling, to improve onboarding and keeping things lean, versus a home-grown framework that greatly improves DX and targets domain-specific needs, but that is much harder to onboard, as well as more junior engineers missing out on skills that transfer outside of the company. On the latter point, I feel it's not as problematic, as a lot of companies have a lot of company-specific usages anyway, but it's important to keep a good balance.

George noted that unfortunately they didn't have any metrics prior to the work, so there's no hard evidence of the productivity gains, but from chatting with the team, it has significantly moved the needle, and that most importantly, people > practices.

I'd be very interested in some of the jobs they've got going on, it sounds like a pretty great place 👀

Being in Engineering Productivity, a lot of this talk resonated with me, and not only solidified a lot of my thoughts, but also gave me some good nuggets of knowledge.

Paddle's use of Roadie, and their Backstage plugins to provide insight into their dependencies had a good crossover with the sort of power that dependency-management-data can provide, and it really good chatting with them more on the stand about how they leverage the data to make useful decisions and visualisations, in particular around monitoring how many teams are behind on updates to their core libraries.

Using Go to Scale Audit logging at Cloudflare

Last up we had Arti Phugat from Cloudflare, talking about how Cloudflare improved their Audit Logging solution by leveraging Go.

In it, Arti took us through an example of a recent performance scalability issue that the Audit Logging service had, and how it was resolved. This centred around a release of audit logs all across Cloudflare. As part of this, their Kafka topic was having ~3000 events/s published, but the consumers were only processing ~500 events/s, which required some transformation and then inserting into a database.

This was still within the team's Service Level Objectives (SLOs) but was understandably not what the team wanted, as they'd forever playing catch-up.

Arti explained how it was very important to measure the throughput and latency, and then that they took steps to perform profiling on the application to identify bottlenecks, which surfaced two issues, highlighted by the execution time of ~8s per event being consumed:

  • data transformation took ~5s
  • database writes took ~2s

There were some tweaks that could be made to the code, taking advantage of Goroutines better, and on top of that, they perfomed a higher level of scaling, tripling the number of Kafka partitions, and providing each partition with a separate consumer.

The use of horizontal scaling led to a significant performance increase, and was further improved by the performance by bulk INSERTing rows into the database.

Arti took us, incrementally, through a lot of the key concepts that the talk would require, which was a great way to share possibly new concepts to the audience, and referenced the other many talks at GopherCon UK this year which were around Go's great tooling for profiling applications.

Hallway track

After my time at State of Open Con this year, where I was so busy with talks that I ended up only speaking to one sponsor, this time I wanted to be more intentional with my time at both speaking to sponsors, and attendees.

I had a very good chat with the folks at Paddle around oapi-codegen, APIs, dependency-management-data (and how it compares to their Roadie setup), Renovate and Open Source sustainability.

Was also nice to meet Andy Williams and get a good time to chat, having only chatted on #cup-o-go on Gopher Slack ☺️

I managed to speak to some users of oapi-codegen, and help explain to one that the upgrade path from v1 to v2 is super minimal, and just having general chats with several folks.

I will say that I'm not sure I'm such a fan of going to a conference without really knowing anyone. It's nicer when you have "safe" people to fall back to at times, and although I did bump into some folks for a chat, it was a lil' bit awkward at times.

I think I'm also used to a lot of recent conferences involving being a speaker, so I sometimes have more folks coming to reach out to me.

Hotel

Not as important as the conference itself, but I feel I need to talk about it.

I'd gotten my approval from work to go fairly late in the day, at which point the trusty Travelodge in the area ended up being far above the budget 😬

Fortunately, as I was looking at other hotels nearby, I managed to find one that looked pretty close on the map, and was well in budget.

It turns out it was The Montcalm At The Brewery, which was literally attached to the venue. Very good location indeed 👏🏼

I found it super useful to be able to nip up to the room to charge my laptop over the lunch break, or nip up to the room for a bit of quiet time, and the fact that it was well under budget was a very nice side effect.

Nerd sniped

After chatting with the folks at Paddle around OpenAPI, I'd mentioned the OpenAPI Overlay specification as something they may be interested in.

Based on the conversation, and a recent discussion in the Gopher Slack, I got a little nerd sniped by the utility of the Overlay specification for oapi-codegen and for users who can't / would prefer not to modify their input OpenAPI specifications.

This morning - waking up early due to nearby construction work - I ended up hacking together an implementation for oapi-codegen, with thanks to Speakeasy's awesome library for Overlay functionality. I then spent a bit of the day cleaning it up and getting it ready for prime time.

I'm not super fussed about the JSON Path requirement, as you'll see from the documentation, but it's a pretty powerful option, and I'm very much looking forward to it unlocking a number of things for our users, as they can't always modify their specs at source.

Closing

The conference was good overall, there was some great food, a good level of snacks, treats, and it didn't feel too rushed around between talks.

I think some shorter talks probably would've been nice, as the 45 minute slot + 15 minutes of questions felt a little bit long at times.

I'm looking forward to the next one, and as Andy Williams mentioned, I'll be sure to be getting my CFPs in on time this coming year!

Thanks to the organisers for running the conference, the wonderful staff, and all the great speakers! Looking forward to seeing y'all next year.

Written by Jamie Tanna's profile image Jamie Tanna on , and last updated on .

Content for this article is shared under the terms of the Creative Commons Attribution Non Commercial Share Alike 4.0 International, and code is shared under the Apache License 2.0.

#go #events #conference #gopher-con-uk.

This post was filed under articles.

Interactions with this post

Interactions with this post

Below you can find the interactions that this page has had using WebMention.

Have you written a response to this post? Let me know the URL:

Do you not have a website set up with WebMention capabilities? You can use Comment Parade.