System Architecture for Edaqa’s Room

I tried explaining to a friend how my games were setup, but it became confusing quickly. Drawing all the component boxes, I’m surprised to see how complex it has become. I think it’s a decent example of modern system architecture, and will go through the setup here. This is for a multiplayer game, so I’ll point out how this might differ from a more typical web application.

I could reasonably call this architecture the platform on which my game runs. A higher-level of code runs on top of, but is intimately tied, to this platform.

Client

I like to start at the user’s view on the system, as it keeps me grounded in the system’s purpose. Mostly the user interacts via the website, but I also send email confirmation on purchase. The starting point to the game could be via the immediate web link, or the link in the email.

I was tempted to split the client into a game and website proper, as they are fairly distinct aspects of the system. But the discussion of the website’s logical structure is better left for another article.

Note the two lines from the browser to the HTTP server. One is normal HTTP traffic, and the other is for WebSocket. Though they go through the same machines, they are handled differently. I’ll provide more detail later, but the way I handle WebSocket is specific to a multiplayer game — a need for a fast response motivates the design.

In terms of fault tolerance, it’s the client which is most likely to fail. From browser incompatibility to crashes, and slow or lost connections, the client is an endless pool of problems. The servers are virtually faultless by comparison. As this is an interactive multiplayer game, it’s vital to handle common client problems correctly. The higher level code handles most of the faults, which this architecture supporting it.

Cloud Processing Services

The three red boxes contain the abstract aspects of the cloud service. These services are mainly configurations and I have no insight into their internal structure. They contain only transient data.

  • Content Delivery Network (CDN): The CDN serves all the static assets of the website and the game. Most of these resources use the web server as the origin, as it gives me the cleanest control over versions. The CDN provides faster loading to the client and reduces load on the host machines. I could do an entire article on the challenges of getting this working. (Service: AWS CloudFront)
  • HTTP Frontend: This takes care of the incoming connections, as well as SSL handling. It provides, when needed, a slow rollout to upgrading the hosts. It’s a security barrier between the public world and my private hosts. Thankfully, it routes both normal HTTP and Websocket traffic. (Service: AWS Elastic Load Balancer)
  • Email Sender: Sends purchase confirmation emails to the user. I mentioned the client layer is fault prone, and email is no exception. You absolutely want a third-party service handling the challenging requirements of modern email. (Service: AWS Simple Email Service)

Host

My host contains several microservices, which I’m grouping into a large block. With Python as the main server language, I was forced into the microservice architecture. Separate processes is the only way I can get stability and parallel processing of these services.

These are all launched as systemd services on an AWS Linux image.

  • Web Server: Handles all web requests, including static files, templates, game launchers, and APIs. These requests are stateless. (Service: Python Code with Eventlet and Flask)
  • Game Server: Implements the game message queues, which are shared message rooms per game — think of it like a chat server with channels. This is stateful per game. It handles client connections and transmits messages but does not understand the logical game state. For fault tolerance, it was vital that misbehaving clients don’t interfere with other games. (Python Code with Asyncio and Websockets)
  • Message Service: Migrates game messages from the live database to the long-term database store. This happens regularly to minimize the memory use of the live database, allowing more games to live on one host. (Service: Python Code)
  • Confirm Service: Sends emails when somebody purchases a game. I avoid doing any external processing in the web server itself, instead having it post a job that is handled by this service. This keeps the web server responsive and stable. (Service: Python Code)
  • Stats Service: This is a relatively fresh addition, needed for my affiliate program. I previously calculated game stats offline for analysis, but am working on features to present those at the end of the game. There is a bit of ping-pong with the web server to get this working. This is external, as it has slow DB queries and slow processing. It operates sequentially, as I do not want multiple stats running in parallel. (Service: Python Code)
  • Live Database: Contains game state for all games on this host. The game uses a sequenced message queue. For a synchronized visual response between players, it is vital this service is fast. Therefore I use a local Redis store to keep live messages, with the message service moving them offline. (Service: Redis)
  • Message Queue: Provides the message queue for these services to talk to each other. This is per-host because a few of the services need access to the Live Data for a game. The Confirm service does not need live data, and I could orchestrate the stats service to not need it either. However, having an additional shared message queue is unnecessary overhead. (Service: Redis)

The diagram creates siblings of the Live Database and Message Queue boxes, since the same process implements both. This is another point where the needs of the game dictate this local Redis server. Most web apps can probably use an off host queue and an external DB service. When you look at my alternate design later, you’ll see I’d be happy to have this part even faster.

I estimate a host can handle at least 100 concurrent games, around 400 users, and I dream about the day when I need many hosts. I can also add region specific hosts, providing faster turnaround for groups playing in other countries.

WebSocket

The diagram shows two different connections between the client and the HTTP Frontend, which continue to the backend.

The black HTTP connection is stateless, and it doesn’t matter which host it ends up at. Ultimately, when my dreams of high load come to fruition, I’d separate this, putting it on a different host pool, or potentially recreate it as lambda functions.

The orange WebSocket connection is stateful and must always arrive at the same machine. This is sticky per game; all players of the same game must reach the same machine. This must be done as a single host to minimize turnaround time. Shared, non-local queues, lambda functions, and DBs, all introduce too much of a response lag. This is particular to a multiplayer game.

Alternate Game Server Design

Again, I’m kind of forced into the above architecture because of Python. Should I ever need more performance, or wish to reduce hardware needs, I’d reimplement this, likely choosing C++, though any compiled static language with good threading and async IO would work.

A new single server would be a single application replacing these services:

  • game server: Depending on the language and framework, this socket handling code could look very different. Much of the speed improvement though would come simply from better data parsing and encoding.
  • message service: I’d gain more control over when this runs and have an easier time reloading messages for clients
  • stats service: I would make this a lot simpler since it wouldn’t need as much cross-process coordination to work.
  • live database: Simple in memory collections replace the Redis DB, providing faster turnaround, but complicating persistence and fault management.
  • message queue: The remaining job messages would migrate to a shared queue, like SQS.

This alternate architecture is simpler, at least to me, and I estimate it could easily handle 100x as many games on a single host. Or rather, it’d let me handle as many games as now, but with several much smaller hosts. That would improve fault tolerance.

Added coding time keeps this on the long-term backlog. Unless some here-to-unknown feature appears where I need this, it’ll be cheaper to keep the microservices model and spin up more hosts as required.

An intermediate solution is to code strictly the websocket channels in another language, since it’s the most inefficient part. Though I recently reprogrammed this part, still in Python, to be massively more efficient. New rewrites are on the long-term backlog.

Storage

The storage boxes contain all the long-term data for my game. There are no game assets here; I store them on the host where I upload each game. This provides the easiest way to manage game versions.

  • Media Store: Holds large static assets which aren’t part of the game proper, such as trailers and marketing materials. I synchronize this on-demand with a local work computer. (Service: AWS S3)
  • Log Store: Collects and stores the logs from the HTTP Frontend. I analyze these offline regularly. (Service: AWS S3)
  • Database: This is the heart of my business data, storing purchase information and persisting long-term game state. (Service: Mongo)

What’s Missing

I’ve left several components out of the diagram to focus on the core experience. I’ll describe them briefly here.

I don’t show monitoring, partially because it’s incomplete, but also because it’s merely a line from every box to a monitoring agent. The structure doesn’t change for monitoring, but it’s part of the live environment.

I’ve left DNS out of the diagram for simplicity. I use multiple endpoints for the client, the web server and the CDN, as well as for email, which adds up to many DNS entries. In AWS one has Route 53, but the individual services can thankfully configure, and maintain most of their entries automatically.

I have many offline scripts that access the database and the log store. This includes accounting scripts which calculate cross-currency payments and affiliate payouts — world sales with tax are a nightmare! I also do analysis of game records to help me design future games.

There’s an additional system used to manage the mailing list. As the sign-up form is part of the website, and people can follow links from the emails to the website, it is a legitimate part of the architecture.

Layers upon layers

I’m tempted to call this the hardware architecture, but with cloud services, everything is logical. It’s a definite layer in my system. Can I call it the “DevOps Layer”?

The website on top of this is fairly standard, but the game is not. I will come back and do some articles about how the game functions. I can also show how the system architecture and game architecture work together.

Other than a few game specific parts, the architecture is fairly standard for an internet application. I believe this is a good approach to what I needed.

Highly inefficient invisible animations (CSS/Firefox/Chrome/React)

The cursor in my text editor was lagging. It’s quite unusual given my 8 cores machine with 32GB of RAM. While tracking down that issue, I discovered that my escape game was consuming 20-30% of the CPU while idling. That’s bad! It turns out it was invisible elements being rotated via CSS.

It’s a bit of a pain. This means we need to remove all those elements which fade-away, otherwise they pile up and create load. Here I’ll show you my solution using React — the top-layers of my game are in React, that’s why I used it. I’m not suggesting you use React to solve this problem. But if you have animated HTML elements, get rid of them if they aren’t visible.

The Problem

While loading scenes, I display an indicator in the top-right corner of the screen.

This fades in when loading starts and fades out when loading is done. I wanted to avoid an abrupt transition. I handled this with CSS classes to hide and show the element. My React code looks like this:

	<SVGElement 
		url={url}
		className={RB.class_name("load-marker", className, is_loading && 'loading')}
	/>

SVGElement is my component to load SVG files and display them inline. An img tag will perform the same way for this setup. The key is the is_loading && ‘loading’ part of the className attribute. This adds the loading class name to the element while it’s loading. When finished loading, I remove the class name.

This is the CSS (SCSS):

.load-marker {
	&:not(.loading) {
		animation-name: fade-out;
		animation-fill-mode: forwards;
		animation-duration: 0.5s;
		animation-timing-function: ease-in-out;
	}
	&.loading {
		animation-fill-mode: forwards;
		animation-duration: 0.5s;
		animation-timing-function: ease-in-out;
		animation-name: fade-in;
	}
	@keyframes fade-out {
		from {
			opacity: 1;
			visibility: visible;
		}
		to {
			opacity: 0;
			visibility: collapse;
		}
	}
	@keyframes fade-in {
		from {
			opacity: 0;
			visibility: collapse;
		}
		to {
			opacity: 1;
			visibility: visible;
		}
	}
}

I have an urge to digress into a rant about CSS’s animation system! I’ve written animation and layout systems before, and argh, this is acid thrown in my eyes. Indeed, that system has a clear adding and removing animation support, making this whole setup trivial. But this is CSS, and, alas…

When an item loses the .loading class it will transition to a transparent state. The problem however came from some other CSS:

.loader {
	svg {
		animation: rotation 6s infinite linear;
		overflow: visible;
		position: absolute;
		top: 20px;
		right: 20px;
		width: 70px;
		height: 70px;
	}
	@keyframes rotation {
		from {
			transform: rotate(0deg);
		}
		to {
			transform: rotate(360deg);
		}
	}
}

That infinite bit is the problem. It’s irrelevant that we’ve faded the opacity to 0, the animation is still running! Firefox still does a style and layout update, each frame. Why it ends up consuming so much CPU, I have no idea. Chrome also consumed CPU, but only around 10%. Note, 10% is still ridiculous for a static screen.

I could also “solve” the problem by not spinning the item unless something is loading. This creates a rough transition where the icon abruptly stops rotating while fading away. Not good.

The Solution

I have two animated indicators, the loader and a disconnected icon, for when you lose the WebSocket connection to the server. I abstracted a common base component to handle them the same. This is how I use it, for the loader:

export function Loader({ is_loading }) {
	return <HideLoader
		url={theme.marker_loading}
		is_loading={is_loading}
		className="loader"
	/>
}

This is the implementation:

function HideLoaderImpl({ is_loading, url, className }) {
	const [ timer_id, set_timer_id ] = React.useState(0)
	
	React.useEffect(() => {
		if( !is_loading && !timer_id ) {
			const css_duration = 1000
			const new_timer_id = setTimeout( () => set_timer_id(0), css_duration )
			set_timer_id(new_timer_id)
		}
	}, [is_loading]) // only trigger on an is_loading change
	
	const visible = is_loading || timer_id
	if(!visible) {
		return null
	}
	
	return (
		<SVGElement 
			url={url}
			className={RB.class_name("load-marker", className, is_loading && 'loading')}
		/>
	)
}

const HideLoader = React.memo(HideLoaderImpl)

At first glance, it’s not obvious how this achieves a delayed removal of the element. The HTML generation is clear, when visible is false, then display nothing. When true, display the element as before, with the same logic for setting the loading class name.

If is_loading is true, then visible will be true. This is the simple case. But there is the other true condition when we have a timer_id.

The setTimeout callback does nothing but clear the timer_id when it’s done. At first I suspected I’d have to track another variable, setting at the start and end of the timeout. It turns out that all I need to know is whether there is a timeout at all. So long as I have a timer, I know that I shouldn’t remove the element.

The condition list to React.useEffect is important here. I provide only is_loading — I only wish for the effect to run if the value of is_loading has changed. Some style guides will insist that I include timer_id (and `set_timer_id`) as well in the list. That approach defines the second argument to useEffect as a dependency list, but this is incorrect. It’s actually a list of values, which if changed, will trigger the effect to run again. The React documents are clear about this. Yet also say it’s a dependency list, and recommend a lint plugin that would complain about my code. That recommendation makes sense for useCallback and useMemo, but not for useEffect.

Adding timer_id to the list would be wrong. When the timer finishes, it sets the timer_id to 0. That change would cause the effect to trigger again. This is a case where we do “depend” on the timer_id value, but we shouldn’t re-execute when it changes, as that would end up creating create a new timer.

In any case, this simple code now does what I want. It defers the DOM removal of the element until after the end of the animation. Well, it defers it one second, which is long enough to cover the 0.5s CSS animation. It’s complicated to keep these times in sync — more fist shaking at the CSS animation system!

If you’ve got an eye for defects, there is one there. The loader icon can be removed too early: when is_loading becomes true, then false, then within one second becomes true and false again. I don’t create a new timer if one already exists, so the deferral time will still be from the first timer. In practice, this will not likely happen, and the impact is minimal. The fix is to cancel an existing timeout and always create a new one.

My lagging cursor

I never got an obvious answer why my cursor was lagging. There were all sorts of applications, idle applications, consuming 5-10% CPU. It’s perhaps a real cost of high-level languages. More on that another day. I still hope that future apps will strive for less energy use.

For now, remove all those invisible animated HTML elements.

High-Throughput Game Message Server with Python websockets

An error came up during a competition with my game. One of the 80 players got stuck. Like really stuck: a breaking defect! The error should not have happened, could not have happened, yet it did. The only possibility was the Web Socket stack I use. Turns out that layer didn’t work as intended, and there was no way to fix it. Alas, I sought an alternate solution.

Here I describe what I came up with: a new, more focused stack, using Python websockets. Lots of coroutines, asyncio, and queues. I have the complete working example at the end of this article. This is likely the last state where it can standalone; I’ll be changing it to fit even tighter with my game code.

Logical Structure and the Problem

My game is a multiplayer puzzle game coordinated by messages. I segregate each instance of the game from the others. For messages, I do this with the classic “rooms” concept. It’s the name that Flask-SocketIO uses as well, and that’s what my first implementation used.

Mostly, messages in the game don’t need a defined total order. They can arrive in a different order to the different clients. There are a few situations where this isn’t true however, places where I need some messages to have a defined total order. There aren’t many of them, and that’s likely why I didn’t notice the defect earlier.

When I first started the project I asked whether the library, if used on a single client system, echoing in order, could maintain an order to the clients. The answer was yes, but the truth is no. Given the load is mostly network bound, it usually maintains an order. Only in a few stress times, or because of dumb luck, it sends messages out-of-order.

Fine, I can probably put a queue around it. Ugh, the throughput drops to an abysmal 70msgs/s. Without the queue it was already a slow 1200msg/s, but that was enough for my game. After a bit of back-and-forth, me and the library author disagree on what is acceptable throughput.

So I grabbed the websockets library instead, whipped together a proof of concept, and got 12,000msgs /s. Yeah, that’s more like I’d expect.

Actually, I’d expect even more. And long term, if I get enough traffic, I’ll rewrite this in C++. The throughput should be entirely network bound, but it’s still CPU bound on the server. I’ve done a lot of low-level networking before to know I can push it higher, but for my needs now, 12K/s is way more than enough. I’d likely scale the number of servers before worrying about optimizing one of them.

On to the code!

A Python websockets Messaging server

The “websockets” module is a minimal implementation of WebSockets. That sounded like what I wanted. I didn’t want to go to the low-level of handling the protocol. This left me writing all my high-level logic, in particular the client rooms.

The library is easy to use. I got a basic example working with little effort. Of course, then there are lots of details to cover. Here’s a quick list of things I needed to support, features first:

  • Handle an incoming client where “handle” is mainly done by the library
  • Allow a client to join a game room (each client can only join one room in my system, simplifying the code)
  • Allow another client to join the same game room
  • Allow other clients to join other rooms
  • Allow a client to send a message
  • Provide a total order to the message, with a message id
  • Dispatch that message to all clients in the room

In my final code I’ll persist the messages to a Redis store, then eventually to a MongoDB. That is not part of my example code.

And there are several situations, or errors, that I’d have to deal with.

  • A client disconnects cleanly or abruptly
  • The client sends crap
  • The client is slow
  • Cleanup a room if there are no more clients in it

Structure

My server maintains a list of clients in a list of rooms:

@dataclass
class Client:
	socket: Any # What type?
	id: int
	disconnected: bool = False
	
@dataclass
class Room:
	key: str
	clients: Dict[int,Client] = field(default_factory=dict)
	new_clients: List[Client] = field(default_factory=list)
	msg_id: int = 0
	event_queue: asyncio.Queue = field(default_factory=asyncio.Queue)
	listening: bool = False
	future: Any = None # What Type?

I use type annotations for my Python, along with MyPy to check the types. Alas, for several library classes I’m unsure of types. Since many of them are created automatically, or are returned from other functions, it’s difficult to determine the type. I will eventually find out all the types.

In these data types, the socket is the only part directly connected to the “websockets” module. It tracks the incoming connection and used to send and receive data.

In brief, the listen_room function handles the incoming client connections. I push all messages onto the event_queue of the Room. The listen_room function listens to this queue and sends messages to all clients in the room.

One listener per room

I initially had a single listening queue that handled all the rooms. When I eventually write the lower-level server, like in C++, I’d keep this structure. When you get low-enough level, you can control a lot more details, removing the need for coroutines entirely.

But in Python there are a few reasons I’m using one listener per room:

  • not overhead
  • Redis
  • bad clients

Overhead is all my Python code, and the library code, surrounding the writing to the clients. It’s not a lot, but it can add up with a lot of activity. I suspect the JSON parsing and formatting is the biggest part of it. But this is not a reason I have one listener per room. Since the Python code is running as a single real thread, it is irrelevant whether this code happens in one listener, or many listeners. It’s all unavoidable computational load.

The first real reason, Redis, is the well-behaved motivator. For each outgoing message I have to create a unique message id. In the sample code I track this in Python, in the Room class. On my final server, I’ll track this in a Redis integer key. Additionally, I’ll store all messages in a Redis list. A separate process will clear this regularly and persist the messages to a MongoDB. The calls to Redis take time, time that the server could instead process messages for other rooms. Thus I want to segregate the rooms. While one room waits on Redis, the others can continue processing.

The second reason, bad clients, is an unfortunate need. It’s possible that a client gets disconnected, or fails to process messages quickly enough. For the most part, this is handled by buffers. The calls to socket.send are effectively asynchronous, at least until the queue fills up. When that happens, send will wait until there is space in the queue. While waiting all the other rooms will stall, being unable to send any messages. By having one queue per room, I limit the damage of a client to that room only.

This won’t likely happen. First off, the websockets library has a timeout feature. Unresponsive clients will be disconnected long-before the outgoing socket buffers get filled up. My game simply doesn’t generate enough messages to ever fill the buffers. Extrapolating from my stress test, with an estimated average message size, there is room for 25K game messages in the standard buffers. And a typical run-through of my game, with a team, generates only 3 to 4 thousand messages.

In any case, it’s good protection to have.

clients, new_clients and memory

One advantage of having a single real thread is not needing to worry about actual data races. They simply don’t happen as they would in a multi-threaded application. Yay! No memory corruption is possible.

It doesn’t mean that race conditions, something different, don’t happen. The logical concerns of concurrency still exist, though to a lesser degree. Thanks cooperative threading! The most significant concern in my code is with the clients object. The queue listener iterates over the clients. If the list is modified during the iteration, Python will throw a concurrent modification exception. That is a strict no-no, as the iterator has no idea what it should do.

There are three cases where the list needs to be modified:

  • when a client disconnects in listen_socket
  • when a client disconnects in listen_room
  • when a new client joins the room

At first I handled disconnect in the listen_socket function, but through testing noticed it can be a socket.send() call that detects the disconnect first. Thus the disconnect happens in multiple places. In both cases, I merely mark the client as disconnected in the Client structure. The listen_room skips disconnected clients while sending messages. It’ll track them and safely remove them from the room after the iteration loop.

When a new client joins the room, listen_socket adds it to the new_clients list. listen_room will then add new clients prior to each message loop. It does this just after retrieving a message to ensure that all new clients get the message. This means that the room messages can arrive at a client prior to the “joined” response from joining the room. In my game, getting this ordering, along with sending old messages, is important for clients getting a consistent game state. I’ll likely have to adjust this code a bit.

At no point does listen_socket know if it’s safe to work with clients, since it can’t tell if listen_room is inside or outside of the loop. A lock isn’t a bad idea, but it introduces an avoidable delay on the incoming listening side, and delays in the room listener. Why lock when I don’t have to?

In retrospect, it might be a disadvantage that much of the coroutine parallelism is implicit, especially if using something like eventlets. As a programmer, it’s less apparent where the logical thread switching happens. You just have to know that every await operation, every asyncio call, and every websocket call is a potential location for a thread switch. It’d be nice to say you should assume a switch is possible at any time, but then I couldn’t rely on it not switching in some places and would require a bunch of locks.

Use locks if you aren’t sure. Performance is irrelevant if your game breaks. grumble

Stats and throughput

I added a simple Stats class to track throughput on the server. It emits timings for all the incoming and outgoing messages per room. The 12K/s is what happens if I have multiple clients connected to unique rooms. My machine hits that 12K limit with the server process pegged at 100% CPU use.

Unfortunately, I must adjust my number down to 10K. Once I moved the rooms to individual listeners, I hit far more overhead. I’m not entirely sure why — I can’t imagine it’s the extra number of coroutines. Likely there are some tweaks in the async stuff to improve it, but it’s still fast enough that I’m not concerned.

As a curiosity, I measured a single client connected to the server. It’s getting slightly over 5Kmsgs/s. Since this is a client and server, I have two processes. They are both at 58% CPU use. Ideally they should be at 50% CPU use, since they send a message back and forth. That extra 8% is processing spent doing stuff other than handling the message. Perhaps if I wrote the system in C++ it’d get closer to 50%, but never reach there completely. The throughput however should go up.

When I say C++ would be faster, it’s because of my experience. I have better control of what happens and know how to use that control. It’s easy to get it wrong and up with a steaming pile that is worse than the clean Python version. Server code is hard!

The stats don’t directly measure response time. But knowing the ping pong nature of the tests, I can calculate roughly what that’d be. At fractional milliseconds per message, it’ll be noise compared to the true network overhead when deployed.

This statistics I calculate here aren’t great. If I were trying to write a truly high-performance server, I’d track averages, standard deviations, extremes, and record it all better. The numbers it’s showing are so over-powered for my game though that there’s no need for more.

Next steps

Now I need to get this integrated with my existing server. I think I lose the ability to use a single port and single Python instance. That’s not a big loss, it’s something I was intending on doing at some point, anyway. The game server shouldn’t be the same as the web server. This is both for performance and stability. Eventually, should I have enough load, I must run multiple game servers (multiple servers handling web socket connections). I have a plan to scale that direction, but it’ll be a long time before I get there.

Code

Below is the code for the server, the Python client, and a sample client for the browser. While the details may change slightly, the structure will stay close to this. Given the size of the code, it’s better not to spin this into a library. I’ll directly it adapt it for my game server.

This code is fairly stable and should work as a starting point for your own needs. Though of course, I can’t make any promises of that. I’ll likely discover things in my application that require fixes.

ws_server.py

from typing import *
from dataclasses import dataclass, field
import asyncio, websockets, json, time
from collections import defaultdict

#sys.path.append('../server')
#from escape.live_game_state import LiveGameState

def encode_msg(msg: Dict) -> str:
	return json.dumps(msg, ensure_ascii=False)
	
def decode_msg(text: str) -> Dict:
	return json.loads(text)


@dataclass
class Client:
	socket: Any # What type?
	id: int
	disconnected: bool = False
	
@dataclass
class Room:
	key: str
	clients: Dict[int,Client] = field(default_factory=dict)
	new_clients: List[Client] = field(default_factory=list)
	msg_id: int = 0
	event_queue: asyncio.Queue = field(default_factory=asyncio.Queue)
	listening: bool = False
	future: Any = None # What Type?

	def client_count(self) -> int:
		return len([c.id for c in self.clients.values() if not c.disconnected])

client_id_count = 0

rooms: Dict[str, Room] = {}
	

# Used to get a basic idea of throughput
class Stats:
	def __init__(self, name):
		self._name = name
		self._count = 0
		self._time = time.monotonic()
		
	def incr(self, amount = 1):
		self._count += amount
		if self._count > 5000:
			end_time = time.monotonic()
			print( f'{self._name} {self._count / (end_time-self._time)}/s' )
			self._count = 0
			self._time = end_time
			
			
async def listen_room(room):
	if room.listening:
		raise Exception(f'Already listening to {room.key}')
		
	room.listening = True
	print(f'Listen Room {room.key}')
	stats = Stats(f'Outgoing {room.key}')
	while True:
		qevent = await room.event_queue.get()
		if qevent == None:
			break
			
		# Add any new clients that have shown up, this handler must control this to avoid it
		# happening inside the loop below
		if len(room.new_clients) > 0:
			for client in room.new_clients:
				room.clients[client.id] = client
			room.new_clients = []
		
		# In my game I'll track IDs in Redis, to survie unexpected failures.
		# The messages will also be pushed there, to be picked up by another process for DB storage
		room.msg_id += 1
		qevent['msg_id'] = room.msg_id
		
		count = 0
		disconnected: List[int] = []
		for client in room.clients.values():
			if client.disconnected:
				disconnected.append(client.id)
				continue
			count += 1
			
			# There's likely some asyncio technique to do this in parallel
			try:
				await client.socket.send(encode_msg(qevent))
			except websockets.ConnectionClosed:
				print("Lost client in send")
				client.disconnected = True
				# Hoping incoming will detect disconnected as well
			
		stats.incr(count)
		
		# Remove clients that aren't there anymore. I don't really need this in my game, but it's
		# good to not let long-lived rooms build-up cruft.
		for d in disconnected:
			# Check again since they may have reconnected in other loop
			if room.clients[d]:
				del room.clients[d]
			
	print(f'Unlisten Room {room.key}')
	room.listening = False


async def listen_socket(websocket, path):
	global rooms, client_id_count
	print("connect", path)
	client_id_count += 1
	room: Optional[Room] = None
	client = Client(id=client_id_count, socket=websocket)
	
	stats = Stats('Incoming')
	try:
		async for message_raw in websocket:
			message = decode_msg(message_raw)
			if message['type'] == 'join':
				# Get/create room
				room_key = message['room']
				if not room_key in rooms:
					room = Room(key=room_key)
					rooms[room_key] = room
					
					room.future = asyncio.ensure_future(listen_room(room))
				else:
					room = rooms[room_key]
					
				# Add client to the room
				room.new_clients.append(client)
				
				# Tell the client which id they are.
				await websocket.send(encode_msg({
					'type': 'joined',
					'client_id': client.id
				}))
				
			elif room:
				# Identify message and pass it off to the room queue
				message['client_id'] = client.id
				await room.event_queue.put(message)
			else:
				# Behave as trival echo server if not in room (will be removed in my final version)
				await websocket.send(encode_msg(message))
			stats.incr()
	except websockets.ConnectionClosed:
		pass
	except Exception as e:
		# In case something else happens we want to ditch this client.  This won't come from
		# websockets, but likely the code above, like having a broken JSON message
		print(e)
		pass
	
	# Only mark disconnected for queue loop on clients isn't broken
	client.disconnected = True
	if room is not None:
		# Though if zero we can kill the listener and clean up fully
		if room.client_count() == 0:
			await room.event_queue.put(None)
			del rooms[room.key]
			await room.future
			print(f"Cleaned Room {room.key}")
			
	print("disconnect", rooms)


def main() -> None:
	start_server = websockets.serve(listen_socket, "localhost", 8765, ping_interval=5, ping_timeout=5)

	asyncio.get_event_loop().run_until_complete(start_server)

	asyncio.get_event_loop().run_forever()

	
main()

ws_client.py

A simple client that validates the correct ordering of messages. Provide a room on the command line.

There’s an option to slow this client which forces the server to disconnect it when the buffers fill up.

You’ll note my client test code is rougher than my server code, and lacking many type definitions. This code will not be used long-term, but the server code will be.

from typing import *
import asyncio, json, websockets, time, sys

if len(sys.argv) < 2:
	print(f"Sytnax {sys.argv[0]} room (delay)" )
	sys.exit(-1)
	
room = sys.argv[1]
# A non-zero slow creates a client that can't keep up. If there are other clients in the room
# it will end up breaking, causing the server to disconnect it.
slow = 0.0
if len(sys.argv) > 2:
	slow = float(sys.argv[2])

def encode_msg(msg: Dict) -> str:
	return json.dumps(msg, ensure_ascii=False)
	
def decode_msg(text: str) -> Dict:
	return json.loads(text)

# An even simpler stats tracker than the server	
trigger_count = 5000.0
if slow > 0:
	trigger_count /= (1+slow) * 100
	
	
async def reader(websocket):
	count = 0
	seq = 0
	last_time = time.monotonic()
	client_id = None
	last_msg_id = None
	
	async for message_raw in websocket:
		count += 1
		msg = decode_msg(message_raw)
		
		if msg['type'] == 'joined':
			client_id = msg['client_id']
		else:
			# Ensure the messages have a single total order
			msg_id = msg['msg_id']
			if last_msg_id is None:
				last_msg_id == msg_id
			else:
				if msg_id != (last_msg_id+1):
					print(last_msg_id, msg_id)
					raise Exception("bad msg sequence")
			
		if msg['type'] == 'ping' and client_id == msg['client_id']:
			# Ensure our own measures retain the order we sent them
			if msg['seq'] != seq:
				print(seq, message_raw)
				raise Exception("bad message seq")
			
		# Track rough throughput
		if count >= trigger_count:
			next_time = time.monotonic()
			print( f'{count /(next_time - last_time)}/s {room}' )
			last_time = time.monotonic()
			count = 0
			
		if client_id == msg['client_id']:
			seq += 1
			await websocket.send(encode_msg({'type': 'ping', 'seq': seq }))
			
		if slow > 0:
			await asyncio.sleep(slow)
		
		
async def hello():
	uri = "ws://localhost:8765"
	async with websockets.connect(uri) as websocket:
		print("Connect")
		await websocket.send( encode_msg({ 'type': 'join', 'room': room }) )
		consumer_task = asyncio.ensure_future(
			reader(websocket))
		done = await asyncio.wait(
			[consumer_task],
			return_when=asyncio.FIRST_COMPLETED,
		)
		

asyncio.get_event_loop().run_until_complete(hello())

ws_client.html

A simple web browser client, to prove that it’s working where I need it.

<html>
<script>
function loaded() {

	console.log("Loaded")
	const socket = new WebSocket('ws://localhost:8765');

	socket.addEventListener('open', function (event) {
		socket.send(encode_msg({
			type: 'join',
			room: 'js-room',
		}));
		console.log("Opened")
	});

	socket.addEventListener('message', function (event) {
		console.log('Message from server ', event.data);
	});
	
	window.addEventListener('beforeunload', () => {
		console.log("UNLOAD")
		socket.close()
	})
	console.log("LOADED")
	
	let count = 0
	setInterval(() => {
		count = count+1
		socket.send(encode_msg({
			type: 'ping',
			seq: count,
		}))
	}, 500)
}

function encode_msg(msg) { 
	return JSON.stringify(msg)
}

function decode_msg(text) {
	return JSON.parse(text)
}

</script>
<body onload="loaded()">
	<p>Text</p>
</body>
</html>

What I look for while play-testing

I’ve completed a lot of play-testing lately for my game Carnival. It’s a fascinating experience to watch people work through the puzzles. And it’s a humbling experience as people stumble and flounder on failings in my designs. Without a doubt, focused user testing has made my game significantly. Here I’d like to write about the principal things I’m watching for.

A friend of mine spurred this article, as she discovered I keep extensive notes during the play-test. She also tested the game and was curious to know if I wrote anything bad about her. I assured her it was all good stuff, while silently discarding the evidence behind my back.

Play-testing Process

Carnival is played online in the browser with teams that meet in an audio/video chat. While the faces can be helpful for testing, it’s mainly the audio I’m listening to. For the video, I have one participant share their screen. It’d be great if I could have all participants share their screens, but I’d probably need more monitors to make sense of it.

Hearing the people play is super useful, in addition to watching what they do on screen. Where they move their mouse can be telling, but hearing their “hmms” and “huhs”, exasperated laments, and cries of fowl, tell even more. All of this helps me understand their thought process in solving puzzles and easing the play for future players.

In the early rounds of play-testing, I instruct people about what to expect and how the game works. This allows me to test prior to the game being finished, avoiding some redundant work. I try to reduce the verbal preamble quickly as possible for subsequent teams, replacing it with the in-games systems. By the late stages of testing, I start with only a few hellos, then send the teams off on their own.

I don’t record these sessions. First off, I don’t care to deal with the privacy aspect of archiving such data. But second, I’d never watch them again. It’s almost always better to test with a new group of people than labour endlessly on a single session. More people equals better testing, especially with a puzzle oriented game.

Cryptic Notes

As mentioned, I take a lot of notes for these sessions, where lots means 2-3 pages of cryptic scrawling for the roughly 90 minute test sessions. A symbol showing the type of comment prefixes each line . At least in theory, in practice, I have some lines that have nothing before them. If something occurs to me during the test, I note it down, even if I’m unsure it’ll be helpful.

For clarity here, I’ll replace my symbols with emojis. That way we can debate their semantic importance, rather than dissect my post-modern scribbling approach to art.

🐟 Red Herring

This became the most important symbol in testing. Red herring’s a problem in puzzle games. These are things, an item, a graphic, some dialog, a pattern, virtually anything that misdirects the player. The player already has to resolve many important pieces of information in their head.

I’m talking about unintentional red herrings. These are colour patterns, or objects, that reasonable look like clues to a puzzle. They arise naturally out of the graphic design by accident. I have nothing but ire for designers who intentionally add red herrings in their game. It’s an undebatable poor design that frustrates players. There are enough unintended red herrings to deal with already.

For example, I have a puzzle in Carnival where you need to set a row of lights to the correct colours. I intend them to match another pattern on the screen, as hinted in some dialog. Lo-and-behold, some testers found another sequence of colours, in some flags, that seemed to match the lights as well. They then proceeded to match that pattern, which of course failed. This gets a big “🐟” mark in the notes.

🐟 sequence of flags matches light pattern

⭘ Obvious thing to change

Many small, sometimes large, obvious ideas crop up during the tests. These may be ways to improve the puzzles, graphics improvements, overall UX ideas, or basically anything. The ⭘ means I had a concrete thought and I should go back and improve it later — at which point I add a ✔ to it.

Each circle is a clear opportunity to improve the game. Rather than theorizing about things to improve, these all come from actual players. Fixing them would have a direct impact on some future player. This doesn’t mean they all get resolved, as priorities still play a role, and some of them are hard to fix.

One example I have is with a ticket in the game. The player acquires the ticket and must present it to get into the carnival. I thought it’d be obvious that you show the ticket to the man in the booth, but one player tried to use the ticket on the entry sign. It’s not ridiculous, since the sign, with a “No Entry” label hanging on top, is where you enter the park. What happens then is frustration. In their head the player has resolved the situation: they found a ticket and are using it to get inside. It not working is like throwing a wrench into their thought process.

The obvious resolution was to make the sign accept the ticket, and that’s the note I made:

⭘ tried show ticket to sign, accept ticket

Though, I ended up using a more humourous fix. The booth agent cracks a joke about an inanimate sign. This has the effect of affirming the player’s action made sense, but was slightly off. Additionally, it made it clear where the ticket should be used instead. Oh, I could talk at length about this leading…

🏁 Puzzle Solved

Whenever a player solves a puzzle, I note it with a little flag, and the time.

🏁 rabbit hand puppet, 23:03

This is the wall time, since it’s what I have most available. I record the start of the session as well, letting me calculate later the time offsets between puzzles.

My goal here is to establish a good pacing for the game. Droughts, long periods between solutions, demotivates the player, and increases the likelihood they dislike the game. Given my experience with the previous game, and numerous escape rooms I’ve played, the pacing mostly worked as is.

However, there were some clear problems on some puzzles — they took too long. In most cases, I didn’t need to know the time to see that people were frustrated. But sometimes the time helped decide the frustration was okay. Perhaps the player was being impatient, rather than having an actual problem.

Timing can also deceive. In one game the players took nearly 15 minutes for one puzzle, well beyond the typical 5 I consider a maximum — these aren’t like fundamentally fixed numbers, so don’t quote me on them. But after the puzzle they both said “wow, that was great” — or something to that effect, I can’t always read my writing. This makes evaluation tricky, but at least the timed notes give some help.

❓ Hint requested

Everybody gets stuck, but an experienced team should not need hints. It’s why I treat hint requests during play-testing seriously. The game has a built-in progressive hint system, so all I do is note where they requested the hint, and how many they needed.

❓❓❓ dart game

You might think requesting a lot of hints is bad, but for experienced teams, if they need one, they typically need many. This results from the hints being progressive; the players often already know the information in the first couple of hints, and only the later one reveals something new. Short of deciphering the players thoughts with electrodes plugged into their brain, there’s no real way to avoid this.

Well, there is a way to avoid some hints, and I do that. Some hints can be tied to game actions, in particular with the inventory. But the general purpose hints can’t be tied to in-game events. I can never really be certain the player already knows the hint.

Requesting hints is part of puzzle games. It’s totally fine if players ask for them, but it shouldn’t be the standard approach to solving a puzzle. And since people who play-test are mainly interested in this genre of game, it biases the results — I assume the average player will require more. It’s a good sign when some play tests have no ❓’s on them. Past that point, I can consider each tension point more carefully.

‟ What they said or did

To give context to my notes, and the players thought process, I take several notes about what they say, or what they do. The latter also uses quotes, because I couldn’t think quickly what other symbol made sense, and in practice the notes are mixed.

‟Is this random? click on light

These notes primarily serve as anchors to the other ones

I’ll also make notes of solutions they’ve tried that have failed. These can often give ideas of how they are thinking, or, sometimes, I end up accepting alternate solutions if they seem equally valid. For some of these I’ll end up using a CIRCLE-‟ combination in the notes.

⭘ ‟12345, hmm, doesn’t work

❗ Something is broken

I put this last since it’s not the point of this level of play-testing. I have already resolved most of the functional defects, and the game is fully playable before I begin play-testing. Several minor defects still appear, and if I’ve recently changed something, engine defects are possible (the frightening and thankfully rare ❗❗).

❗ B-girl font not converted path

The ❗ is for things that are 100% definitely technical defects. This could be a graphic that is missing, the wrong font used somewhere, or a typo in the text. I suppose I could use them for defects in the puzzles, though oddly, I’ve not have that situation come up yet. As this is a multiplayer game played over flaky networks, I’ve tried hard, from the start, to make the logical game state consistent. That appears to work. While defects are possible in the puzzles still, I’ve likely worked them out prior to starting play-testing.

I think that’s an important point: I’ve played the game entirely many times before I do any play-testing. I want play-testing to focus on the things I can’t find myself. Even the initial play-testers get a game that is working, albeit potentially without the hint system, and not all graphics in their final form, but functionally playable from beginning to end.

Scribbles

These symbols are a general guide to the notes I take. I still have other things written, sometimes in combination, or sometimes with no markings.

I have found though that trying to itemize my thoughts produces firm notes. Rather than write everything, and anything, I focus on specific action items:

  • 🐟 Something is misleading
  • ❗ something is broken
  • ⭘ Point for improvement
  • ❓ a puzzle may have a problem

Where 🏁 and ‟ are then used to anchor those points, giving context when I go back later.


While not a perfectly defined process, that’s about how I did play-testing of Carnival. Watching people play, think, and laugh is fascinating. I also love watching the streams of people playing my game, as it gives another insight. I take notes from those as well.

How to write a custom selector in React

What’s an efficient way to react to global state updates in a React app? If using Redux, you’d use a selector. But I don’t use Redux for my puzzle game, as I have my own state object. It works similar to redux — I have an immutable state which is completely replaced on modification. All changes in the logical game state are done there.

I used React’s contexts to subscribe to state changes for the UI. This works, except parts of my UI are needlessly rerendered. The context update is sent upon any change, even if that part of the UI doesn’t care about it. In practice this isn’t too bad for my game, since I have few components listening, and pass properties down to memoized components. Still, I don’t like inefficiency, and I know useSelector from other projects.

How could I get the selector logic in my own React code? I have a game state, and I know which parts I’m interested in, so it should be easy. I thought a long time about how it should be done, a lot more time that it took to finally implement. I’ll cover what I did here, hopefully reducing the time you need to search for solutions.

What does React offer?

Somewhere in React is a subscription mechanism. That’s how the components know to update when something changes. There are two options: context and state. They are both needed to build a selector.

Using a context is well documented. Nonetheless, here’s a brief outline of how I used this prior to creating a selector. My actual code is TypeScript and has a layer of wrapping around this.

let GameContext = React.createContext([game_state, game_manager])
let game_manager = get_game_magically_from_global()

function MainComponent() {
	// I use React's state system to track the game state within this component.
	const [game_state, set_game_state] = React.useState(game_manager.get_current_state())
	
	// My game manager needs to tell me when the state changes.
	React.useEffect(() => {
		game_manager.watch_state(set_game_state)
	}, [set_game_state])

	// Provide the current state value to the context to pass down through the tree
	return (
		<GameContext.Provider value={[game_state, game_manager]}>
			<EdaqasRoomUI />
		</GameContext>
	)
}


function NiftyGameItem() {
	const [game_state, game_manager] = React.useContext(GameContext)
	
	const drop = React.useCallback(() =>
		game_manager.drop_item()
	}, [game_manager])
	
	return (
		<img onClick={drop} src={game_state.held_item.image} />
	)
}

I provide both the current game state and the game manager in the context. The state is for reading and the context for providing feedback. This is similar to Redux’s dispatcher; my game manager also uses messages to communicate with the state.

The State

Notice useState in that example as well. For React, updating the context is no different than any other use of the state. The extra aspect of the context is providing that value to the descendents of the component. This is what the Provider does.

State can be used without a context as well. Here’s a simple example as a reminder.

function ExpandInventory() {
	const [expanded, set_expanded] = React.useState(false)
	
	const toggle = React.useCallback(() => {
		set_expanded(!expanded)
	}, [expanded, set_expanded])
	
	return (
		<>
			<CompactView onClick={toggle} />
			{expanded && <GloriousFullView />}
		</>
	)
}

When the user clicks on the compact view, the browser calls the toggle function, which modifies the state. When the state is modified React will rerender the control.

JSX files create an illusion of close cooperative harmony between this code, the state, and the HTML DOM. The truth is a lot uglier. The HTML goes through React’s diff engine, then is assembled into the browser’s DOM tree. The callback function lives in the global heap, connected to DOM object, as well as being a closure over the stack frame in which it was created. The closure will be called in response to a user’s click, far away from the stack in which the render code was run.

Understanding this structure is the key to making our own selectors. That set_expanded function can be called from anywhere and React will figure out how to update the component as a result.

Too many updates

Any component that needs the game state can call useContext(GameContext). The problem is that all state changes, whether they’d alter the component or not, cause the component to rerender. In my previous example, the NiftyGameItem only needs to update when held_item changes, yet currently it’ll update anytime anything in the state changes. That’s pointless and wasteful.

If I were using Redux, I’d use a selector to solve this issue.

const held_item = useSelector( game_state => game_state.held_item )

Only when game_state.held_item changes will the component rerender.

useSelector itself isn’t magical. It is essentially a layer in between the state and the control. It will listen to every update to the game state, and run the selection function. But it will only update the component if the result of the selection function changes.

I wanted the same facility for my game state.

My own selector

useState is the primary hook into React’s subscription system. At first, I looked for an explicit subscription API. What I wanted to do isn’t directly covered in the state docs. But as I mentioned before, understanding how the callbacks, DOM, and state connect, assures me that my approach is correct.

What is the goal? This is what I want my NiftyGameItem to look like, ignoring the onClick part for a moment.

function NiftyGameItem() {
	const held_item = useGameState( gs => gs.held_item )
	
	return (
		<img src={game_state.held_item.image} />
	)
}

I only want to update when held_item changes. Let’s jump right the almost final code.

type game_selector<T> = ( state : GT.game_state ) => T

export function useGameState<T>( gs : game_selector<T> ): T {
	const [_, game_manager] = React.useContext(GameContext)
		
	const [ state, set_state ] = React.useState<T>(():T => gs(game_manager.current_game_state()))

	React.useEffect(() => {
		const track = {
			current: state,
 		}
		
		return game_manager.listen_game_state( (game_state: GT.game_state) => {
			const next: T = gs(game_state)
			if (track.current != next) {
				track.current = next
				set_state(next)
			}
		})
	}, [game_manager, set_state, gs])
	
	return gs(state)
}
	const [_, game_manager] = React.useContext(GameContext)

I get the game manger as I did before, but we’ll have to come back and fix something here.

	const [ state, set_state ] = React.useState<T>(():T => gs(game_manager.current_game_state()))
	...
	return state

I prep the state for the component. The game manager needs to provide the current state as it’ll be needed when the component first renders, not only when the state updates. Here I don’t track the entire game state, only the part that is of interest — the part extracted by the selector.

A selector function, gs here, takes the global state as input and returns the part to be watched. My useGameState code calls the gs selector function with the global state. The selector in my example is gs => gs.held_item, which retrieves only the held_item. In the game I have an on-screen indicator showing which item the player is currently holding.

I return the state at the end of the function. In the first call, this will be the initial state. In subsequent calls, for each new render of the control, it’ll be the current state.

		return game_manager.listen_game_state( (game_state: GT.game_state) => {

The vital piece of code inside useEffect is the call to listen_game_state. I added this subscription function to the game_manager. The game manager already knows when the state updates, since it has to update the context. Now it updates the context as well as calling all the registered listeners. I’ll show this code a bit further below.

		const track = {
			current: state,
 		}
		
		return game_manager.listen_game_state( (game_state: GT.game_state) => {
			const next: T = gs(game_state)
			if (track.current != next) {
				track.current = next
				set_state(next)
			}
		})

Each time the state updates, the caller provided selector function is called to select a part of the state. This is compared to what value it had previously, and only if it has changed do we call the set_state function. If we were to call the set_state function every time, then it’d be no better than the caller listening for every state change.

Note the return. The listen_game_state function returns an unsubscribe function, which will be called whenever the effect is reevaluated, or the component unmounts. The game manager shouldn’t hold on to components that are no longer around.

	React.useEffect(() => {
		...
	}, [game_manager, set_state, gs])

The useEffect runs once when the control is mounted (or first rendered, more correctly). I have a dependency list of [game_manager, set_state, gs] for correctness. Should one of those change the effect needs to be reevaluated to grab the new values. In practice, these dependencies never change.

useState outside of a component?

It may seem unusual to call the useState function in something other than a react component. This type of chaining is allowed and expected. There’s nothing special about calling useState directly in the component, or inside a function called by the component. React will understand which component it is in and associate it correctly.

I’ve not looked into precisely how this works. My assumption is that it’s a global value that tracks the current component. The hook functions inspect that variable to figure out where they are and to register the appropriate listeners. I don’t see another option, since useGameState is a plain TS/JS function — the JSX compiler has no chance to modify it. In threaded languages this stack would need to be thread local, but JS is single threaded (workers, etc. get their own global space, making them effectively thread local).

My selector is a combination of existing React functions: useState, useEffect, and useContext.

Hold on, there’s a problem

I have an issue in the first line of the useGameState function:

	const [_, game_manager] = React.useContext(GameContext)

I reused the context from before, the one that provides the game state and the game manager. This is bad. Since it hooks into the game state context, this component will still be updated with every change of the state.

To fix this, I added a new context which contains only the game manager.

	const game_manager = React.useContext(GameManagerOnly)

This game manager never changes for the life of the game, thus no needless updates will be triggered by the call to useContext.

At this point I gain nothing by storing the game manager singleton in a context. I could simply refer to the global object from my useGameState code. However, to behave as a proper React app, I’ve left it in the context. This may also be important for your project, where the object isn’t a singleton.

Save the batteries

Performance wasn’t an issue for my game. Curiousity was part of the reason I wrote the selectors. The selectors do of course help; there were thousands of needless updates to components. Cutting back this processing time should help older machines, as well as saving battery power on tablets.

I’ll continue to make optimizations where I see them. It may be inconsequential compared to the massive browser SVG rendering overhead, but there’s nothing I can do about that. As my games get more complex the calculation will continue to increase. Keeping it performant can only help long term.

Plus, you know, curiousity. A solid reason to do something.

Check out how this all comes together in my game Edaqa’s Room: Prototype. A collaborative online escape room full of puzzles, adventure, and probably no vampires.


Appendix: Game Manager subscription code

This is the listen_game_state code called by useEffect in useGameState. I’ve removed details about how I connect to my state object, for simplicity. If you’d like a closer examination of that part, let me know.

export type game_state_listener = (gs: GT.game_state) => void

export class GameManager implements StateChanged {

	gsl_id = 0
	game_state_listeners: Record<number,game_state_listener> = {}
	.
	.
	.
	listen_game_state( listener: game_state_listener ): ()=>void {
		this.gsl_id += 1
		const nid = this.gsl_id
		this.game_state_listeners[nid] = listener
		
		return () => {
			delete this.game_state_listeners[nid]
		}
	}

Subscription queues needn’t be complex. On updates to the game state, the function below is called (part of the `StateChanged interface`).

	game_state_changed(game_state) {
		if( this.set_game_store ) {
			this.set_game_store(game_state)
		}
		
		for (const listener of Object.values(this.game_state_listeners)) {
			listener(game_state)
		}
	}

The first line goes back to the game_manager.watch_state(set_game_state) call at the start of this article. It’s what updates the context storing the game state.

The loop is what tells all the useGameState listeners that something has changed.

I Wrote an Online Escape Game

I’m an escape room enthusiast, some may say addict, and for the past few months I’ve been missing it. A friend of mine, a true addict with over 500 rooms to his name, started organizing online competitions. After playing a few of the online games, I thought, “I want to build my own.”

So for that past couple of months I’ve been writing an online escape game — which you could say is a web puzzle game, but with the exciting flare of escape! It’s suitably called “Prototype”. I assumed that name would let me get away with some rough edges. This will be an evolving project, but the first installment is a success.

I’m proud of my game. I want to tell you how I made it.

Technology for the user

I had a few major goals for the game. These sit somewhere in the spectrum between user epics and use cases.

  • Painless experience for the user: I wanted it all in the browser. These types of games are relatively short, and needing to download something would be a pain.
  • A multi-player team experience: Real rooms admit teams of 2-6 or more players, and I wanted my game to allow the same. Additionally, a certain world crisis is an excellent motivator for remote team play.
  • Painless registration: Beyond paying, I didn’t want any registration at all. This bugged me about many other games. Just let people play as quickly as possible.

Obviously clever puzzles and a fun experience were paramount, but it’s harder to quantify those directly. Those would be the product goals, and I felt the above points were critical to supporting those goals.

Given these requirements, I set out to write my own engine, as I saw nothing that would come close to what I want. I was picky with my game, not letting it get away with anything I’d complain about in other games. Naturally, a few priorities chipped a few notches in that plan.

Overall, I achieved those goals. Let me know where I should elaborate — priorities again, I don’t want to be writing blindly about everything!

A puzzling web stack

A design had been floating in my head for a while before I set down any code. There was some trial and error, but the architecture was stable from the start, with only a few deviations in method.

Here are some major pieces of the stack.

  • React: Just React. No optional modules, no plugins, nothing. The core of React provided what I needed. Since I had a state machine, there was no need for something like Redux.
  • Python and Flask: The server components, and game processing, are written in Python using Flask, with Flask-SocketIO, with Eventlet (always so unavoidable many layers here).
  • Redis: A small, but essential part to coordinate the multi-player actions.
  • SVG: I’m listing this as it’s a key part of the engine. Everything is based on SVG working well in the browsers. It was a major trouble point, yet surprisingly rewarding.

There are also the typical web server bits, using Jinja templates, talking to Mongo, Paypal… Zzz. Yeah, I’ll mention these bits more, but I suspect there’s nothing novel here.

Until late in the project, this was a mass of wiggling bits! I had many stressful days trying to juggle tech in my head. Getting something working was my primary goal, and I did that in stages. Now, as I write more games, I’ll keep refining the stack, but there won’t likely be any major architectural changes.

Languages are what I do

As a good friend of mine said, “No Edaqa project would be complete unless there’s a new language.” I’m too blinded by the beauty of languages to even catch a hint of criticism there.

The most important question of the technology is: did I want to write a game engine? The answer is a resounding “no!” I wrote the engine because I wanted to achieve my primary goals, saw nothing else that fit, and knew an engine was within reach. I wanted to design games. And I wanted to not be overly burdened while designing.

Thus there’s a domain-specific language for the games. It’s a high-level declarative language. I fully expect that long-term I’ll write other engines for it. My goal was to keep the game logic clean, without being bound to the engine. I want to write games and ensure that long-term I can maintain those games.

I’ll be happy to show you how the language works, what the preprocessor does, and how the game handles the code.

As they say, “Ask me anything”

I hit a lot of knowledge pockets and defects on this project. Not everything I did is obvious — and I hesitate to say some illogical bits remain. But I’m happy to talk about all of it.

Let me know what interests you the most, and I’ll answer what I can, providing more writeups where necessary. And if you like puzzles, or escape rooms, I invite you to play the game.

I encourage you to try the game, Prototype: A Game Master is Needed. It’s an escape game I wrote, and have lots to say about.

Your 30th Year in Code

Becoming a programmer can be a daunting task. After reading Your First Year in Code, you might wonder what awaits you long term. My book, What is Programming covers more of the skills you’ll need, but here I want to share the personal challenges.

I’ve been programming over 30 years now. That’s a long time to be in this industry. I know many people that didn’t make it this far or have grown overly distraught. As the years ticked by, there are some major problems I faced and seen others endure.

Stagnation

Technology changes faster than any individual can follow. The nature of programming is that tools learned this year will not likely be relevant five or ten years from now. If you haven’t noticed yet, learning is the most needed skill for a programmer. Every year, every month, every day, brings with it something new.

I’ve jumped between a lot of jobs: an employee at startups, a contractor at larger companies, and in my own side projects. My continuous exposure to new things has helped me avoid stagnation. Without this job hopping, it can be difficult to learn new things. Yet you have to, otherwise you’ll find yourself increasingly distant from modern programming. Job listings will become increasingly foreign and best practices you see online evermore confusing.

Fortunately, the approaches to programming change a lot slower. These are the major themes, for example, how to manage a project and the best practices for coding. Unlike specific tools, this knowledge transfers to new projects. However, it does change, and it’s important to keep up. Many people sit on one project for ten years or longer. This extended period may cause losing track entirely of what programming has become.

There’s no need to endlessly learn all the shiny new things, but you need to follow the overall trends. By taking the time to appreciate the concepts behind the tools, you’ll find your knowledge transfers easily to new tools. By showing even a passing interest in the industry as a whole, you should be able to keep abreast of major changes.

Fatigue

The cycle of learning has a toll on us. It requires a continual input of energy. We don’t get this option of riding on our current knowledge for long. I can’t say how often I would have appreciated a job where I could apply my current knowledge. In freelance contracts, I sometimes bid on projects, not because they are interesting, but solely because I won’t have to learn anything new. It saves energy.

On top of the learning is the stress of switching jobs. Changing companies is a massive change in your daily life. Even transferring within a company can be a significant change. The adjustment period adds stress. It’s a great way to avoid stagnation, but comes with a personal cost.

Our lives change as we age. The innate drive of a twenty-year-old fades as you enter your 30s and 40s. You’ll have found other ways to spend your time. Your late night hobby may no longer be fiddling with code. A life won’t be only programming. This means less time for all the learning you need to do.

On the other hand, it’s often these non-programming activities that bring energy back. I don’t think I could have maintained my energy so long without them. I’ve done writing, music, massage and more. There’s a balance to be reached, and I tended to overwork, which is a bottomless energy sink.

Physical well-being plays a significant role, and it’s not uncommon to succumb to a poor lifestyle. From my experience, the stereotypes of unfit programmers are unfortunately common. I’m not judging, but a lack of physical health brings you down. Learning is exhausting and requires a fit mind, which requires a fit body. And as you lose energy, you lose time. You lose the time to exercise and eat right, which further diminishes your energy, throwing you into a downward spiral.

I dedicate a lot of my time to health and wellness, though still perhaps not enough me time. It’s easy to get sucked back into projects. And despite my goal of having a project where I learn nothing to conserve energy, it ends up being those which drain even more energy. Once the motivation is gone, my energy dwindles.

Honestly, I’m still trying to figure this one out. I’ve had a good run of 30 years where I don’t think my motivation faded, but the energy is definitely lower. I’m facing a bit of a problem now. But fortunately, I’m writing about the first 30 years and can ignore the ones that come after.

Cynicism

And what happens when you get tired? When my motivation drops and I don’t have an interesting project, well, I become right proper sick and tired of it all. Cycnicism is not a new problem, but it’s definitely a big one.

The cycle of learning and a lack of energy leads to frustration. The inevitable feeling that you’ve done this before. Why do we keep learning new tools when the old ones work? Why don’t these new tools do what the old ones did? After all these years, why hasn’t anything gotten any better?

It’s hard to watch things come and go, to watch your best work fade into obscurity and then be requested to do it all again. Keep programming long enough, and the novelty wears off. There fails to be any more interesting projects, any fields where you haven’t worked already, or anything you’re remotely interested in doing.

This attitude often may result from stagnation or waning energy. The more one closes themselves off to a changing career, the more they come to despise that field. Keeping abreast of new things, and contributing to the field, can help stave off negativity.

But I think there’s more to it. Even with my diverse career, or perhaps due to it, I become overly cynical at times. I need to remind myself that though similar, the problems we’re solving now aren’t the same as those from 20 years ago. I even give presentations to this effect, about how much has truly changed in software.

Trolls and idiots don’t help here either — see, that’s my sour side talking. I get angry with people who write bad code online, advice about bad code, or staunchly defend dumb approaches. Good thing I didn’t have twitter when I started, otherwise I would have been pissing off some old programmer as well.

Recognizing a negative trend in your attitude is important. It’s the first step to understand there’s a problem and find ways to fix it. Find the programming that makes you happy. Ignore the battles that you can’t win. Remember the joy you had when you started out. Don’t let cynicism take away.

Ageism

Given that we might be antiquated, drained of all energy, and hate the industry, it’s not surprising that many teams don’t want to work with us. Stubborn old farts are terrible for a team. They’re stuck in the past and consider themselves the best. This results in them making bad decisions and being arrogant to younger team members.

I don’t consider myself to be part of that stereotype, and I know many people who are not. I wish I could say I’ve never met the stereotypical old programmer, but I have, and often. As have my colleagues who do interviews, or are otherwise involved in hiring. I’m clueless as to a percentage of good versus bad.

Which means, even if you’ve become a great programmer, you get to deal with that the stereotype. Unlike the other problems, there’s not much you can do about this one. Be aware of it and be aware that you’ll be the old person going in for the interview. If you’ve kept up-to-date, and still have a positive attitude, I don’t see age as a significant hurdle…

…except, when it comes to contracts or payment. Part of the issue of ageism is that young people are cheaper. Companies are reluctant to compensate senior programmers for their knowledge. Without an objective way to measure individuals I can understand that. I guarantee I’ll be cheaper in the long run, and that my product will be superior, but of course everybody competing for a position will say the same thing. It’s only after you work with me you’ll see the difference.

Young people are also too willing to give up their free time to a company, which is overly relevant in the startup sector. I’m willing to work long hours, for a while, and if you pay me for it. My loyalty to a company exists only so long as it’s beneficial to me as well. I don’t tolerate any company advantaged relationships. This is all part of the balance in my health, keeping fit mentally and physically. But until you find a convincing way to show companies this results in better products, we’ll be fighting the image that well experienced programmers don’t work as much.

See how easy it for me to let my cynicism shine though! Ageism is a real problem, and it’s due both to reality and stereotypes. Combating this is hard. I think that as long as you keep a positive attitude, and avoid stagnation, it’s possible to navigate around the problem.

Ramblings

Those are the problems I’ve confronted, that you’ll eventually face. Realistically, for your first ten years you can probably ignore these ramblings entirely. File them away in the back of your head. Once things look to be going wrong, remember it’s not you. Stagnation, fatigue, and cynicism will happen to all of us. Recognize it, then find a way to get over it.

Why switch is better than if-else

In Ben’s post, he questions whether switch statements are cleaner than if-else chains. I contend they are, because they better express the semantics of the code, allow less room for errors, reduce duplication, and potentially improve performance.

Better Semantics

Switch statements express a different meaning than a chain of if-else statements. A switch indicates that you are mapping from an input value to a piece of code.

switch( expr ) {
    case value_0:
        ...
        
    case value_1:
        ...
}

It’s clear at a glance that we’re attempting to cover all possible values of expr. Somebody reading this code can quickly determine what is supposed to be happening.

This is not as clear with the equivalent if-else chain:

if( expr == value_0) {
    ...
} else if( expr == value_1) {
    ...
}

We aren’t certain here whether we mean to cover all possible values, or only these values in particular.

Many compilers will inform you when a switch statement is missing a condition. In C++ this could be a missing case statement for an enumeration. In Rust, the equivalent match construct covers an even wider range of input, and also disallows missing coverage. This automatic checking by the compiler can prevent common defects. For example, if you add a new value to an enumeration, the compiler can tell you all the locations where you haven’t covered that new case.

Less room for errors and nonsense

One problem of an if-else chain is that it allows any comparison, with any variable. Having no restriction on the form increases the ability to hide errors.

if( expr == value_0 ) {
    ...
} else if( expr == value_1 ) {
    ...
} else if( expr2 == value_2 ) {
    ...
} else if( value_3 == expr ) {
    ...
}

Why is there an expr2 in there? It’s now unclear whether the code intends to cover all values for expr or the conditions are only coincidentally similar.

What about the reversed order of value_3 == expr? This should be corrected in code review, but it’s another possibility of creating confusion.

Refactoring can create this type of problem. An individual may modify the expressions, either fixing an error, or cleaning it up. In a parallel branch another programmer adds a new expression. During merge the two different code forms will come together, resulting in an inconsistent form.

Reduced duplication

Long chains of if-else have unnecessary syntax overhead. Redundancy is one of the principle evils of source code. From the previous example with expr2, we saw that the repeated typing of an expression can’t be ignored. You can’t glance at an if-else chain and assume it’s functioning like a switch, since it may not be. The redundancy adds cognitive load when reading the code.

The duplication could have a performance impact, as well as creating another avenue for errors. I’ve used only expr so far, but what if that expression were a function call?

if( next_obj().get_status() == state_ready ) {
    ...
} else if( next_obj().get_status() == state_pending ) {
    ...
} else {
    ...
}

The first potential error here is the call to next_obj. If the first condition is true, it will evaluate once. If the first condition is false, the next if statement makes another call to the function. Does it return the same value each time, or is it incrementing over a list?

What about get_status()? Is this a cheap or expensive function to call? Maybe it slowly calculates or invokes a database call? Calling it twice doubles whatever the cost is, which may be significant in many cases.

It’s important to store these values in a temporary to avoid both of these problems. This is unfortunately something a lot of coders forget to do, as they quickly copy-paste the first if-else, and then repeat.

state = next_obj().get_status()
if( state == state_ready ) {
    ...
} else if( state == state_pending ) {
    ...
} else {
    ...
}

You could avoid the problem using a switch statement, which only evaluates the expression once.

Better performance

In many cases a switch statement will perform better than an if-else chain. The strict structure makes it easy for an optimizer to reduce the number of comparisons that are made.

This is done by creating a binary tree of the potential options. If your switch statement contains eight cases, only three comparisons are needed to find the right case.

switch( c ) {
    case 0: ...
    case 1: ...
    case 2: ...
    case 3: ...
    case 4: ...
    case 5: ...
    case 6: ...
    case 7: ...
}

An optimizing compiler, or intelligent runtime, can reduce this to a binary search of the numbers.

if( c <4 ) {
    if( c < 2 ) {
        if( c == 0 ) {
            //0
        } else {
            //1
        }
    } else {
        if( c == 3 ) {
            // 3
        } else {
            // 4
        }
    }
} else {
    //repeated for 4...7
}

A clever optimizer might recognize an if-else series the same way. But the potential for minor variations in the statements reduces this possibility. For example, a function call, hidden assignment, or use of an alternate variable, would all prevent this optimization.

By using a semantically significant high-level form, you give the optimizer more options to improve your code.

Language problems

Switch statements aren’t without their problems, however. In particular, the C and C++ form that requires an explicit break statement is problematic. Though, it also allows multiple cases to be packed together.

I like Python a lot, though am upset that it doesn’t have a switch statement. While maps and function dispatching cover several cases, it does not cover all of them.

Rust has a much better match statement. It retains the high-level semantics of a switch statement, but adds a lot better pattern matching. Though I’m not a fan of the language, I think it has the best version of a switch statement. I should call it pattern matching, which in language design, is the more general name for these feature. You’ll see in other languages like Haskell as well.

Perhaps that’s the biggest problem with switch. It feels like a stunted version of proper pattern matching. But that’s no reason to abandon it entirely and go back to if-else. Switch statements produce cleaner code as they express semantics, avoid duplication, and reduce the chance of errors.

Fluid layout animation: Invalidation and caching

Dynamic changes in layout properties, either because of user actions, or animations, requires a recalculation of the layout. Maintaining a stable frame rate during recalculation is challenging, as the layout process is relatively expensive. In this article we’ll talk about the two key approaches that make fluid layout animation possible: limited invalidation and layout caching.

Locality observations

We base this approach on two key observations:

  • Most of the layout is static, only a few elements actively change
  • Changes in layout tend to result in local effects: they may cause a change in parent layout, or a few children, but rarely impact the entire tree.

Our goal in layout is to limit the recalculations to only those elements that will get a new layout.

Downward caching

Let’s start with the easy case: caching. During layout traversal a parent element will make a few requests to a child element. When something small changes, many of these child requests will be the same as a previous request.

The first location is GetMarginSize, as described in the layout protocol. Given two equivalent LayoutParams, and assuming nothing has changed in the child’s tree, the resulting size should be the same. We can cache these results so that future calls to GetMarginSize need not traverse further down the tree.

There’s a trick here though. For constraints, and the occasional multi-pass calculation, GetMarginSize is called multiple times with different LayoutParams. Caching just one result is not sufficient. It turns out that two is enough — this comes from ensuring most of the layout code can be done in a single pass. One call to get the desired size followed only by another call to do the actual layout. If setting the cache size higher helps, it may be a sign that something is doing too many passes higher up.

The second optimization really isn’t a cache. A call to ArrangeMarginBox may provide the element with the same parameters and size it had before. We can avoid some local calculations if the resulting size is the same as before. Unless the LayoutParams is exactly the same though, we still need to traverse to the children.

Upward invalidation

When a layout parameter for an element changes we invalidate its layout. Since a parent’s layout may depend on the size of the child, we also mark the parent as invalid. We repeat this until we reach the root of the tree. During one frame multiple elements can be marked invalid, and they’ll each create an invalidation path to the root.

Each frame we check if there are any invalidations and start a new layout request from the root of the tree. The caches can speed up this part, but it’s still covering far more nodes than are strictly necessary. It’s a good place to start however, and it’s how we I started with the engine. The caches, and avoiding redundant layout on elements, provides a significant boost in performance.

The next boost is avoiding invalidation all the way to the root element. As most animations, and user interactions, take place near root elements, they tend not to alter the overall layout of the app. It’s silly to start the layout calculation at the root each time — even better than cached results is not calling those functions at all.

IsMarginBoxDependent

I created this virtual function on the elements:

/**    
    @return Yes if the child influences the results of ArrangeMarginBox (size or layout of this node),
        No if it cannot, and Maybe otherwise (in cases of stretching)
*/
protected virtual LayoutDependent IsMarginBoxDependent(Visual child) {
    return LayoutDependent.Maybe;
}

Ignoring the Maybe result at first, the function determines whether the layout of this element is affected by a particular child. For example, in a layered layout, the children are independent of each other, and the parent’s size is not dependent on them — it would return No for all children. An example of Yes is a StackPanel. If the size of any of it’s children changes, its own size changes, as well as the positions of all of its children.

The Maybe result is used when the answer can’t be determined locally. Consider these two examples:

Panel {Alignment=TopLeft}
    Rectangle {Color=Green}
        Text {Value="Hello"}
        
Panel {Width=200 Height=200}
    Rectangle {Color=Green}
        Text {Value="Hello"}

If the value of the Text changes it invalidates the layout of the text element. The element will call Rectangle.IsMarginBoxDependent( Text ), but the answer differs in each scenario. The Rectangle itself doesn’t know the answer, it depends on the parent Panel. In the first case the panel, thus rectangle, collapse to the inner size of the text — the Rectangle does depend on the Text size. In the second case, the Panel has a fixed size, the Rectangle expands to fill that — the Rectangle does not depend on the Text size.

A result of Maybe defers the question to a parent node. Once the definitive answer is known, the traversal goes back to the first element that said Maybe, and updates the answers.

The invalidation traversal algorithm isn’t large, but it’s complex. It must deal with multiple invalidations in the same frame — nodes that are invalidated by one element, may be invalidated by another one. My algorithm split No and Maybe to create a NoArrange and MaybeArrange. This allowed an element to say its own size is not affected, but the arrangement of the children is. A minor detail, but it brings a big boost in some common situations.

The value in the IsMarginBoxDependent function is no longer needing to start layout from the root node each time. We can localize the changes and jump down to those elements. Combined with caching this provides highly localized recalculation, fairly close to what you’d expect based on the visual changes.

To avoid another data structure, I built this tracking status into the normal invalidation flags. It still technically started at root, but it had a flag indicating whether layout was required, or to only scan the children looking for a change. If you consider the amount of pruning in the search, this is a nearly linear search, of the depth of the tree, to find the starting points. It’s tempting to store a list of entry points instead, but that gets rather costly when trying to resolve multiple invalidations in one frame.

The function IsMarginBoxDependent is typically answered by the layout a panel has. It can be a tricky function to get right. One must err on the side of caution, essentially returning Yes by default, and No only in some specific situations.

Testing and iteration

A good thing is that this approach can be built iteratively. The caching bits can be implemented independently, as well as the tree invalidation.

IsMarginBoxDependent need not be implemented everywhere at first, the default value always works; it simply isn’t optimal. I implemented this per layout as I found situations where it’d be helpful. Indeed, most of the details in all these optimizations were based on common use-cases — the general case is too broad.

Testing is essential during invalidation. Tests need to cover all sorts of invalidation to ensure that the correct elements get invalidate, as well as verifying the optimized paths are applied. Minor changes in the code can have major effects!

As with the other articles, this article leaves out many details about how the code might look. It covers some key concepts and approaches to optimization. Precisely how this implemented depends a lot on your overall approach. There are many things I’d do differently if implementing this again, but these core concepts would remain.

Creating a custom author box on WordPress

Customizing the author box on WordPress was more challenging than I expected it to be. I checked out several plugins and instead chose to create a PHP function in the theme. Here I’ll look at what I did.

All the plugins

I got my first guest author on Interview.Codes and wanted to ensure proper accreditation. Until now I’ve not had guest authors on my primary blogs, thus haven’t worried about the author box. Usually I’d put footer information directly into the content of the page. This time I wanted to use a proper author box, to be consistent and avoid redundancy in my source files.

Author boxes are something that appear on a lot of sites. Doing a plugin search yielded all sorts of results for customizable boxes. They mostly weren’t what I wanted, or were overly complex for my purposes.

All I really wanted was the ability to make a template for my author box and insert dynamic fields. I’m using Elementor, so I can already create the templates. However, there are no standard short-codes to get at the author data.

Author Data Short-Codes

I double-checked the plugins, to see if any offered author short-codes. A few did, but I didn’t want the bulky plugin for only those short-codes. Plus, they lacked documentation so I couldn’t determine if the codes were sufficient.

I decided to create my own short-codes. I try to stay away from low-level bits in WordPress, but this time it looked like some PHP was the correct approach.

I hopped into the theme editor to make a change. I use the Child Theme Configurator, which lets me isolate my changes to in a child theme. The parent theme remains intact.

I added the following to the functions.php file.

function shortcode_author_data( $atts ){
	$atts = shortcode_atts(
		array(
			'field' => 'user_name',
		), $atts, 'author_data' );
	$field = $atts['field'];
	if( $field == 'posts_url') {
		return get_author_posts_url( get_the_author_meta( 'ID' ), get_the_author_meta( 'user_nicename' ) );
	}
	return get_the_author_meta( $field );
}
add_shortcode( 'author_data', 'shortcode_author_data' );

In my template I create an Elementor “Text Editor” component with the following HTML fragment.

<h4><a href="[author_data field='posts_url']">[author_data field='display_name']</a></h4>

[author_data field='description']
<a href="[author_data field='user_url']">[author_data field='user_url']</a>

I suppose the same could be done with templates not using a block-based editor, but I don’t know much about them. I’ve only ever used templates in conjunction with Elementor.

I combined the HTML with the author’s profile image, something Elementor provides it. For the new article this creates an author box like below.

Fields and get_the_author_meta

The shortcode_author_data function is simple. It follows the common pattern found for shortcodes in the WordPress documentation.

The shortcode takes a field argument. The field is passed to the get_the_author_meta function, which retrieves information about the current author. Those names are standard and can be found in the documentation.

What’s missing however is a link the the user’s posts. That’s available with the function get_author_posts_url. I didn’t want to create a unique short-code, so I overloaded the same one. If the field name is posts_url I’ll use a different function.

I’ll be adding more of these special cases as I my site evolves. Anything related to the user will be added to the author_data short-code.

Next Steps

As I need this on multiple sites I might look into creating a plugin. A plugin will let me use the same code in multiple places, keep the theme even cleaner, let me use a proper programming environment, and manage updates better. It’s not high on my priority list though, so it might take a while.

For now this simple function inside the theme works wonderfully.

A Failed Experiment with Python Type Annotations

I like Python, but wish it had static typing. The added safety would go a long way to improving quality and reducing development time. So today I tried to make use of type annotations and a static type-checker called mypy.

After a few basic tests, I was excited. But my glee turned to disappointment rather quickly. There are two fundamental issues that make it an unusable solution.

  • You can’t have self-referential classes in the type annotations, thus no containers
  • You can’t have inferred return type values, thus requiring extensive wasteful annotations

Let’s look at both problems.

Self-Referential

I’m trying to render some articles for Interview.Codes with my MDL processor. The parser uses a Node class to create a parse tree. This class contains Node children as part of a tree structure. Logically, that means I’d have functions like below.

class Node(Object):

    def add_sub( self, sub : Node ):
        ...

    def get_subs( self ) -> Sequence[Node]:
        ...

mypy has no trouble understanding this, but it’s unfortunately not valid Python code. You can’t refer to Node within the Node class.

The workaround suggested is using a TypeVar.

NodeT = TypeVar( `NodeT`, bound=`Node` )
class Node(Object):
    def add_sub( self, sub : NodeT ):
        ...

    def get_subs( self ) -> Sequence[NodeT]:
        ...

This is ugly. I’m reminded of C++’s _t pattern. Part of my attraction to Python is the simplified syntax. Having to decorate classes like this makes it far less appealing. Plus, it’s boiler-plate code adding overhead for later understanding.

The limitation in Python comes from Node not yet being in the symbol table. It doesn’t make it into the symbol table until after the class is processed, meaning you can’t use Node within the class. This is a limitation of the compiler. There’s no reason this needs to be this way, except perhaps for backwards compatibility with screwy old code.

Perhaps we can’t use the class name. But we could have a Self or Class symbol that refers to the enclosing class.

No Inferred Return Types

One of the great values of Python is not having to put types everywhere. You can write functions like below.

def get_value():
    return 123

Now, if you’re using TypeScript or C++ the compiler can happily infer the return type of functions. For unknown reasons, mypy choses not to infer the return types of functions. Instead, if there is no type annotation it assumes it returns type Any.

This means I must annotate all functions with information the static type checker already knows. It’s redundant and messy.

You’re additionally forced to learn the names and structure of all types. Ones you could otherwise safely ignore.

def get_iter():
    return iter(sequence)

def get_closure(self):
    return lamba q : self.op(q)

Why should I have to know the type that iter returns to write this function? Or do you have any idea what type get_closure returns? I know how to use the return, and can even reason it’s a function, but I’d have no idea how to specify its type. Knowing the myriad of types isn’t feasible. You’ll end up spending more time trying to tweak types than using the code.

This complexity helped drive the introduction of the auto keyword to C++. There are many situations where writing the type information isn’t workable. This is especially true when dealing with parametric container classes,

Inferring return types is an essential feature.

Avoiding it for now

These two problems repeat throughout my codebase. I’m okay when there’s a limitation that occasionally affects the code, but this is fundamental. To use type checking, I’d have to add the redundant class declarations to every container-like class. To use type checking at all, I’d have to annotate the return value of all functions.

Static type checking should not be a tradeoff and there’s no fundamental reason these limitations can’t be lifted. When these are fixed, I’ll happily come back and use type annotations.


Image Credit: Mari Carmen

A Parade of Web Tech

Perhaps I wrote that I don’t know how to create a website, but glancing back, I certainlyhave built a lot. And oh boy, there’s quite the variety of technology involved.

I thought I’d catalog the sites I worked on and make a brief review of how they were produced. This is a testament to illustrate how varied web programming is, and how fast things change.

I could not determine a reasonable ordering. I went with quasi-reverse chronological but with groupings. There is considerable overlap in the timeframes involved, spanning way back to 1995!

The WordPress Family

EdaqasKitchen.com — WordPress.org, Python/YAML

For my cooking website, I wanted I wished to focus content more than technology. Yet, I didn’t want to compromise on my vision. WordPress fits this bill — WordPress is the illusion of a simple CMS marred by the billion-and-one ways to use/configure/program it.

I wasn’t able to find a good theme when I set it up. I took a basic one, but it has many problems. I installed an add-on to change the underlying PHP, and thus have a bunch of custom code fragments now.

The recipes are encoded in YAML files — actually a few files and images, spread over several directories. I use Python code to load them and produce the HTML output. This output is then pasted into the WordPress editor. Once the page is created there is a --update option that uses the wp-cli tool to update the site directly.

Configuring WordPress is a massive pain! When I get a chance, I’ll write a short introduction to terminology and concepts. There’s so much that you won’t easily discover by playing around.

Interview.Codes — WordPress.org, Elementor

My newest site, where I’ve opted for WordPress again. It seemed like a no effort step since I can have multiple sites on SiteGround — where I host these sites. I don’t enjoy doing back-end admin when I want to focus on the content.

This time I dug around more with WordPress and came upon a new type oftool: Elementor. The reason I found this is due to a theme I liked, but ended up being junk. Many themes produce pretty layouts with Elementor (or similar tools) and then try to pass it off as their theme. Once I found this, I removed the theme and used a basic one instead. Elementor is essentially a drag-and-drop design tool. It seems great for static pages, which is what I’ll have on this site.

Ultimately, something will force me to adopt a more complex solution here, it always happens. But for now, it’ll be static content, then linking off to sites like SkillShare.

mortoray.com — WordPress.com, Markdown

Mortoray.com has been my primary website for well over a decade now. When I started publishing technical articles, I wanted to focus strictly on the writing. I didn’t want the overhead of maintaining yet another site — and you’ll see from this list thatI had many of them.

Over time, the complexity increased. I found it easier to write Markdown locally. But that wasn’t enough either, so I added extensions in the Python processor. I’m working on the next generation of this now, as I have my own Mortoray Document Language in development.

There’s little I can say about the technology, as everything is managed by WordPress.com. It also means I have little room for customization. I’ve wanted to change various aspects of the layout, but without upgrading to their expensive business plan, I have no way to further customize the theme nor add plugins. It’s the motivation I had to do managed hosting for EdaqasKitchen.

Modern Statics

LeafLang.org — Jinja, Foundation, Extended Markdown

I said goodbye to my programming language last year, but its website is still there… for now.

Differing from other sites, this one is entirely static. I use Jinja templates, but they are processed on my machine and I upload the HTML results. The HTML / CSS uses the Foundation libraries.

As the articles are code heavy, I used my extended markdown for the documents. This is the same technology I used to write all of my programming articles on mortoray.com and dev.to. It’s being replaced now, with MDL.

This will be offline soon as I’ve let the domain name expire. It’s still sad for me. :(

lomi.land

My massage and wellness company uses essentially the same set of technology as leaflang.org, except here is no markdown. The set of documents is more limited, and not technicalin nature, thus basic HTML with Jinja templates worked.

Edaqa.com — Hugo, Python/YAML

Oddly, my personal resume site is new as of last year. Until then I’ve hosted resumes and history on one of my other sites. I felt it was finally time to centralize my profile.

The domain edaqa.com was registered a lengthy time before I created the site. I used it to run experiments for myself and clients. Perhaps odd, but yes, I was okay with using my namesake domain as a throwaway playground!

I enjoy doing static sites for content — entirely appropriate for this site. There is less overhead than maintaining a CMS and no need to setup a server.

Wanting to stay modern, I tried Hugo, instead of my home-brewed solutions. Overall I’m disappointed with this decision. I don’t find Hugo offers enough to justify learning it. Though maybe it’s okay, but the themes are mostly garbage and hamper the learning process.

I generate some pages from data stored in YAML files. I hate repetition of any kind, so you’ll frequently find me using layers of technology.

A BrainBrain Diversion

BrainBrain was a CMS I built, complete with a social network style front-end. That front-end turned out to be uninteresting, but I re-used the code for several other projects.

EverSystems.eu — Custom Content Manager (PHP)

This will be offline soon, as I’m closing the company. I haven’t updated it in a long time.

I used my content management system on this server. Called BrainBrain, it had it’s own domain as well: BrainBrain.net. Initially designed as a social communications platform, more flexibility was added to adapt to my projects. It was written in PHP and had a web interface to edit articles.

I’m grown clueless about how any of this works!

This is the last act of the BrainBrain technology stack.

BigTPoker — Haxe, Flash, Custom CMS

This was a poker training site, complete with a lot of games. It’s one of the sites I’d qualify as an application more than a static website, even though there was a lot of static content.

I wrote the games in Haxe, which compiled to Flash. I used some JavaScript to integrate them with the page, to feel like a unified web app. Web apps were not yet popular, making this more challenging that it’d be now.

I forgot how I managed the content, it turns out I also used my BrainBrain CMS. I wonder how much time I spent developing that thing.

disemia.com — Custom XSLT Transforms, Mortar

I kind of feel ashamed of this domain. It’s so outdated. Yet, it’s my oldest domain, the one I use in my email address. And by old, I mean old: 23 years and counting!

It’s used a mishmash of technologies over the years. I started with plain HTML — yes, the kind that doesn’t even have CSS since it wasn’t an option yet!

I moved on to Mortar, which was a web site tool for Windows. I was the lead developer on this project in the 90’s. It was a good tool, but it’s long since gone.

Riding a trend wave, I switched to XSLT for the new pages. This used some M4 to add features, then processed by Xalan.

trostlos.org — As Above + Ruby

I maintained my record label website the same way as disemia.com. I did however add some Ruby scripts and a database for the songs. I entered all demo songs and samples in a DB, then used Ruby to generate the static content for them.

Both disemia.com and trostlos.org are/were served on HostBaby. Though, disemia.com came later, as it was previously hosted at numerous locations — back in 1995 it was hard to find hosting!

In Left Field

PuzzlePuzzle.net — WebGL, Python Flask, CoffeeScript, Jinja

An attempt at creating a game in the browser using WebGL. It was an animated 3D puzzle game: the puzzle was a scene, split across typical jigsaw pieces. I blame poor WebGL support for its failure.

I wrote the pages, both text content and game wrappers, using Jinja templates with standard HTML and CSS.

Checking back, I see that I did all the game coding in CoffeeScript, which compiles to JavaScript.

I wrote my server with Python and Flask. I still like this combination for its simplicity in writing and deployment.

This is the most app-like site I’ve ever created. There was no static content, with the entire experience focused on the game. I sure wish I had made some screenshots… I wonder if I could get it running again.

WellBook.org — BrainBrain, Persephone, PHP

For a friend I developed a medicine tracking website. It focused around a calendar to track meds and the resulting mood and state of health. It was short-lived, having not gained any kind of traction.

This should probably be under the BrainBrain section, as it used that same CMS stack. However, the focus was on the app part, with the CMS for writing your own personal notes and also for the occasional update.

Looking into this code, I see that BrainBrain was using Persephone. This is a DB abstraction software I wrote ages ago at eCircle. It built PHP interfaces based on schema and query descriptions.

Redid — Python, Flask

This site is the focus of my lightning presentation, “The Two Month Startup”, which I gave at UnternehmerTUM in Munich. It was a dynamic image hosting service.

It was one of those fancy cloud based services, hosted in AWS. I used my preferred combination of Python and Flask.

It was not a “site” per se, but only an API. It provided HTTP endpoints, and served images via the CDN, but offered no kind of static content, not even for the admin interface. I wrote Python CLI’s to use the service.

More?

I’ve not included a few websites. I had an “NRage” website, which was a kind of social experiment of letting out your anger — like if you took only the trolls from Twitter and made animated boxes. It was also using BrainBrain — I sure used that tech a lot!

I also had a short-lived writing project called PorcupineTimes, but I have no record of what technology I used. Possibly I hosted this on Medium, but I purchased a domain. I don’t know.

Plus a few static sites, like one for Doula’s. It’s kind of hard to keep track of all of them. My domain registries show far more domains, but most of them are variations of the above — though a few aren’t, like lfbt.org which I have no memory of. Plus, hover.com only goes back to 2009, which is only about half of the time covered in my website journey. I think they’ve gone through various name and corporate changes. I tend to use NetIM more for domain names now.

This list focused on sites I did primarily for myself, or in close partnership with others. It does not include those I did as part of formal employment. Those account for only a couple more, but include some technology you might notice are absent here: namely Java and NodeJS.

It’s been quite the adventure.

Terrible interview question: Swap variables without a temporary

“Clever programming tricks have no place in programming interviews. They usually involve a small amount of code and solve an innocent-sounding question like “find the loop in a linked list”. Often unfair constraints are added, such as “you may not use the language’s search functions”. They follow a general pattern of being highly specific and easily searchable. Yet they aren’t something that can be solved within an interview. These are research level questions that require random prior knowledge, lucky leaps of intuition, or a lot of interviewer prompting.”

This article has been moved to Interview.Codes.

Sorry, but I have no way to do an auto-redirect on WordPress.com. They also don’t allow canonical-url, so I can’t keep it here as well. 😞

A programmer’s introduction to user portraits

 

As programmers, it’s vitally important for us to understand what our code should be doing. This knowledge starts with knowing the people using our product. While we’re deep in the code, our mind tends to focus on technical details. But to truly understand the purpose of our work, we need to know how it’s relevant to our users. We need ways to keep our users as integral parts of our development process.

User portraits, or personas, are tools that direct our attention to people. These portraits are profiles of the users of our system. Beyond casual discussions, they let us record our impressions and understanding of the people using our product. User portraits let us talk about a specific person, rather than an abstract concept of a user.

If you prefer video, I cover this material in my class How to Write a Great User Story.

This article introduces a way to track the user and covers what types of things we should consider. As a programmer, it’s crucial for you to be part of this process, and have access to this information. You’ll be making decisions every day that impact the people using your software. This is true whether you’re writing the UI, or working on the back-end server. Each choice you make impacts the user in some way. Thus you should understand what the user wants from the product.

Finding out who our user is

A user portrait is a profile of a single target user. There’s nothing fancy here. It’s like writing a profile page on a blog. It includes a lot of details about who the user is, and what they do. You can pick virtually any tool to write them, like a shared document, or Wiki article, something that allows for collaboration. User portraits are a way of communicating between teams.

You can start your portrait by selecting a photograph of a person. Try a stock photo site, such as pixabay. It’s not essential, but it can help visualize who we’re talking about. It serves as the basis of the user portrait. The focus on a single person prevents our ideas from running wild — we’re not trying to describe our entire user-base.

This idea of a user portrait is called personas in marketing. I’m shying a bit away from that term though, as it has a lot more depth and research than what might go into a shorter user portrait. They often accumulate classes of users into a single person as well, rather than focusing on representative individuals. Certainly though, if your company already produces personas, don’t duplicate that effort.

Along with a picture, a user portrait should include a name, age, and other basic demographics. You want to capture all the relevant details that might influence the use of your product. For example, an application about meeting people would vary depending on whether people live in cities or rural areas. Or the person’s job is relevant if your product is targeting a specific job niche. This need only be a short bio, with some key points that define who the person is.

Working from that bio, write a description of what they are currently doing — in the context of your app. For example, if you want to help writers, you should determine how your user writes presently. What tools are they using, and what processes do they follow.

Contrasting the current state, we add a description of the desired state. That is, what would the user prefer to do differently, or what can your product offer to improve their workflow.

This comparison between the current state and the desired state is the key to addressing the user’s needs. It’s how we find ways to improve their situation, and address all levels of the UX design Pyramid. It’s what gives focus to our development.

Add more portraits to capture more users

Software has a variety of features, and users will be selective of the ones they use. One user portrait will describe only one experience with the system. As you cater to other users, you will create several user portraits.

A collection of user portraits lets you further analyze your system. As a programmer, this lets you design better code. By keeping the high-level desires in mind, you can determine which parts of the system need to be adaptable, and which parts are key features. You can adjust the accessibility of the product based on the abilities, or constraints, of the user.

The prioritization of issues is aided by these user portraits. Quite often we get caught up chasing a stream of issues we mistakingly believe to be vital. It’s easy to get trapped in a corner of the system when you spend a lot of time there. By having high-level user portraits to look at, you can always bring yourself back to the primary goals of the project. You can look at each issue form the high level to determine how much impact it would have.

The value of knowing the user

Often we talk informally about users and even carry their image in our heads while programming. I’ll admit, this is generally the way I’ve worked on projects — there have been few times when I worked with a clearly profiled user. It’s unfortunate, since even a tiny amount of effort, writing down those thoughts, gives a big boost.

There are four significant ways that user portraits benefit our development.

Communication: They provide a way for different teams to talk about the users and gain a common understanding of what the product should be doing. The language should be kept at a widely accessible level. Programmers, artists, marketing, and everybody in a development company should understand the user. Success requires strong communication and a shared mindset.

Psychology: By having specific people in the portrait, it shifts our thinking from designing for a fuzzy concept to designing for a real person. This allows us to empathize and place ourselves in the role of the user. It activates all of those neurons we use to deal with relationships and emotions.

Transformation: We often make assumptions about our users. Having several profiles lets us challenge those assumptions. We can use our real-world knowledge, to decide whether our beliefs are valid, or whether we’re creating an inconsistent image.

Focus: Keeping the focus of development on the user is perhaps the most substantial benefit. There are so many ways we can get distracted. User portraits are a concrete artifact that we can continually look at. Any time we have doubts, we can look back at our users. These representations direct our effort to a common goal.

An example

This is a basic example I use in my class. I’ve kept it short to demonstrate that you don’t need a lot of detail to start. Getting this basic idea is enough to focus on the user. More information can be added over time.

Photograph of Martin

Martin is 32. He lives in a small apartment in a big city. He’s an account manager for an advertising company

What do they do? Martin likes to take his grill and guitar outside on the weekends, near a lake or a river. He loves inviting friends and playing music for them.

What do they want to do? Martin is looking to find fellow musicians who would like to play some music on the weekends casually.

The contrast between their current situation and the new situation is the desire to find fellow musicians. Instead of going out with people he already knows, Martin wants to meet new people. This is the part of his story that is significantly different. He may not be motivated to use an app that only plans his outings unless it also helps him get new people.

Indicating he is an account manager establishes that he has enough technical competence to use a new solution. He’ll also be familiar with communicating with new people and organizing events.

Living in a small apartment gives him the impetus to get out on the weekend. It’s an important consideration if your product suggests meeting locations. Martin would not be interested in office space to hold his gatherings. He’ll need a map that can place locations outside the city. It can influence your choice for how you implement a search feature. There’s little value in searching for business names, or possibly even streets, but a visual map selector with nearby roads marked could be useful.

With this minimal biography, we can deduce a lot about how our system should work. It’s enough to keep our focus on the user.

Moving on to user stories

User portraits are a relatively high level of detail. Even when short, they provide a focal point for development. Having them around is often enough to keep you thinking about the user while you code. They should become the roots of your user experience design.

These are not static artifacts. As you develop, you will extend and modify them. As you encounter questions in the code, you might need to get answers at the user level. Other teams may point out inconsistencies that need to be fixed. User portraits are living documents.

To go further, you’ll want to derive user stories from the portraits. These provide more details on specific user activities. You can check out the Engineering guide to writing correct User Stories. I’ll also followup with my own take on these a bit later. For now, the key is to think in the user’s perspective. Have empathy with the people in your portraits, and consider requirements from their viewpoint.