Today I’d like to show a perhaps-not-so-obvious connection between system architecture and performance. I’ll start by setting the stage.
I’m working on a telecom system whose main component’s general architecture is like this: service logic + supporting subsystems. The service logic is all coded by many (large) finite state machines, whose states in turn are using the supporting subsystems for its services. The services of the supporting subsystems are all asynchronous, meaning that calls on its main functions aren’t blocking, so that the results of the calls are returned later, on another thread, to the event queue of the requesting finite state machine. For example, a call over the network, or even opening a file on the hard disk, is blocking (involves waiting), so it needs to be queued for later execution on another thread. The result events from the supporting subsystems are then driving the state machines forwards.
The threads driving the finite state machines are running at a high priority ("time critical" using Windows terminology), whereas the supporting subsystems’ threads are running at lower priorities. Activity logging to disk is done by a thread running at the very lowest priority.
What happens when we’re running this system under high (constant) load? (Guess what, it’s means many phone calls arriving!) Then you see, empirically, that the processor load generally doesn’t vary much over time. Graphically, it’s almost a straight horizontal line. Apparently the service logic parts distribute all blocking operations pretty evenly over time, and there’s a fixed number of threads in the supporting subsystems, who can’t do more than a fixed maximum amount of work at the same time.
Now, we could say that this situation is good for scalability, in the following two ways:
- Suppose that we’re running at 20 percent roughly constant processor load. Then, doubling the load would (note, naïvely, but see the next point!) make the resulting processor load to be 40 percent. Another system with not-so-evenly distributed processor load, but with the same average processor load, might not enjoy such scalability properties. For example, it could have 60 percent processor load during one fifth of the time, and 10 percent processor load during the remaning time. Then the peak load of the scaled-up system would go "through the roof".
- If we scale up the system by adding more finite state machines, each operation, or "tick", of those state machines is never blocking, so it actually takes a negligible amount of time. So until the sum of all those "tick" times becomes significant, we would enjoy pretty good scalability.
Scalability is never that simple, of course. But these are good properties for scalability anyway.
Now, an interesting question is whether this arrangement between finite state machines with only non-blocking operations on one hand, and asynchronous supporting subsystems on the other hand, is necessary for obtaining such an even processor load, and nice scalability properties. That is, we want to see what happens when the "ticks" of the state machines start to take some small significant amount of time, contrary to the second case above.
We’ll make a simple calculation to illustrate what happens. Suppose that we want to keep 4000 state machines running, and that each state would contain a blocking operation that takes one millisecond to perform. This would mean that if every state machine is ticked at least once per second, the ticking operations during that second would amount to four seconds. Oops! Didn’t work. That would lead to 100 percent CPU load, and probably some timeouts would be triggered, too. We could imagine situations with shorter and fewer blocking operations, but you see the general problem from this example. Many small blocking times add up to a long time when there are lots of state machines. Even if you would have several processors, you would just postpone the problem a bit. You’ll also lose some "fairness" in the state machine execution; you wouldn’t be able to expect a regular frequency of "ticks" to a state machine.
I guess that if we could know the maximum blocking time for an operation, we might be able to build an architecture like this with some blocking operations in the state machine. But typically we don’t know that. That’s usually why the operations are blocking, because we need to wait for something, like a network response. And still, it wouldn’t scale as well as the non-blocking architecture. And above all, it’s not that difficult to make synchronous operations asynchronous by performing them in a separate thread, so there’s no actual reason for having blocking operations in a state machine anyway.
So, there you’ve got some arguments for a nice "state machines + asynchronous services" architecture. I believe that this is about how close you can bring Windows to being a real-time operating system.
[Update 2005-10-17: corrected some spelling mistakes.]
Filed under: Software Development