Distributed Information in the Modern Era


I work as an engineer in the “high performance messaging middleware” space. Most people don’t know what this means, and you would think after doing it for six years I would have perfected a convenient explanation by now. Alas, I have not.

Middleware messaging is software that other software developers use to enable “quality” communication between applications. Quality in each use case varies significantly because sometimes applications need to send lots of data but speed isn’t very important. In other use cases speed, or latency, of the messages is the most important aspect, even more important than the messages themselves (that might be hard to grasp, but since it is not the point of this article I will not bother explaining it right now.)

If we wanted to depict what messaging is in its simplest form, it looks like this:

Sources send messages, receivers receive the messages, and the space between the source and the receiver is the send path. Inside the message is a “payload” which is the content of the message the receiver needs, either for informational purposes or possibly an instruction to perform an operation.

Sometimes messaging systems are brokered. A broker is a piece of software in the send path between a source and receiver. It has advantages like being able to “fan-out” the data (sending it to multiple receivers), “guarantee” the messages by storing them to a redundant storage mechanism, and in some cases the broker can even transform or modify the payload based on a set of rules. A brokered system looks like this:

Seems simple, and at a high level it is quite simple. However, real world implementations are almost never simple. Messaging systems exist in complicated computer networks where the system is often times taken for granted to be able to properly publish a consistent quality of service. For example, a lot of send paths look like this:

And this is still abstracting a lot of the ugly stuff. Most networks have many layers of switches, and routing is usually handled by big beefy servers. And firewalls are like the TSA of networks – they slow everything down, good and bad messages alike. And to complicate things even more so, there are multiple “protocols” in which messages are relayed on, things like TCP and UDP. Protocols are like different languages for the computers and networking equipment in the system. There are standards in how these protocols are supposed to be implemented, but depending on the the actual hardware/software, the implementation can still vary.

Oh and the Internet, let’s not even go there. The cloud that makes up the Internet is understood by very few people, maybe no one in its entirety.  At a very high level, it’s a lot more switches and routers, maybe a dozen or maybe a hundred from point A to point B.

Point is, networks are complicated.

Moving back to messaging, messaging itself is not without its complications. For example, messages can get “lost”. Let’s say a source publishes three messages and the receiver only gets two:

This happens. Often. Like so often that if it did not happen I probably would not have a job. Why did it get lost? It could be a lot of different reasons; a disconnect in the network somewhere, or the message is dropped intentionally by some piece of networking equipment because of higher priority traffic, solar radiation changed bits on a copper wire (this really happens), or maybe because the software in the switch had a bug in it to drop all UDP packets where there was a 1 in the 17th byte of a packet (this not only can happen, I’ve seen it happen). Most commonly the explanation is much simpler. More times than not, a message gets lost simply because the receiver can only receive so many messages in a given period of time, let’s call this X messages per second. If a source tries to send X+1 messages in a second, the receiver simply fails to receive the +1.

Messaging software also has some fancy mechanisms built in, things like “filtering”. Filtering can happen at the source or receiver, and it basically means that message are intentionally discarded based on some kind of pre-configured criteria:

Messaging systems also have configurations for making sure only certain receivers get a “special” message, and not to waste time sending some messages to all receivers that the source thinks it wouldn’t be interested in:

And finally we have brokers which are not only capable of lost messages and special messages but can also modify messages based on specific configurations and change the data before the end receiver gets the message:

Messy stuff. At a high level, this is the complexity of messaging in distributed systems. Many sources are sending many messages, sometimes direct to receivers and other times indirectly via brokers to many receivers, all the while dealing with the inter-tangledness of large networking infrastructures that speak in different protocols and different implementations of the same protocols while messages are explicitly and implicitly being filtered at every node along the send path. It’s safe to say no single node on any of these systems has a complete and honest view of the rest of the system, and that is largely by design because a single node would not be able to process the vast amounts of data that typically flows on large networks.

The Problem

All right, here’s comes the metaphor now; our world of news and information sharing works largely in the same way as distributed systems, just on a much larger and less reliable scale. We have our sources and receivers:

And we have our brokers:

The same problems we have with distributed messaging we have with our information gathering. Sources (governments) filter their messages when delivering them to both end receivers (us) and brokers (press), and brokers transform messages to appeal to a specific subset of receivers, while other nodes aren’t even listening:

Source-to-broker-to-receiver seems to be expanding as each year passes as information publishing gets more economical. Most people used to get their news from a direct broker of information, and even though the messages were filtered, it was at least mostly accurate information. Nowadays, people are subscribing to brokers that get their data from other brokers, further modifying messages from the original:

That’s not to say that the brokered-brokers are always factually inaccurate (they are sometimes), they are heavily modified though. This is not inherently bad, but the big problems exist at the receivers themselves. Us. We have receive side filtering that will discard data based on emotional responses and subconscious biases, particularly if there is outright distrust of the source:

We receive our news and our talking points from more and more filtered sources and brokers, to the point that we’re building our knowledge of the system from fragmented and incomplete data. Then to fragment the data even more, some receivers get information from other receivers that have an incomplete view of the system:

If you think back to the networking example, receivers cannot receive all the data in the system because there is just too much data. Our world is no different, in fact it is even worse. Each source sends messages on a specific topic and the more topics we subscribe to the more messages that get received, and the more messages that get received, and the more prone to loss we are. It’s like trying to receive 10 gigabits of data on a 1 gigabit link – it’s just not physically possible:

This is obviously an over-generalization of the world today. Some people dedicate more time to gathering and processing information and therefore can construct more valuable insights than others, but that takes patience and time. Some people are more honest with themselves, can recognize their biases and try and broaden their spectrum of sources despite the strong emotions to ignore. It’s not easy though. It requires work, it requires a certain amount of dedication, and it requires the emotional restraint to not jump to the conclusions some sources are pushing you to make.  

We all see the noise on social media, the punditry masked as news in our faces 24/7 by friends and family. Some of us are even guilty of sharing the noise without thinking about the possibility of other sources or failure to research if they even exist. It’s easy to be cynical, to lose trust in the system because of this. It’s important to realize that each node, each source, each broker, and each receiver is independent of each other. Yes, governments may try and bend messages in a specific way to shape a narrative. That’s actually why we have multiple independent brokers (press) with access to stay close and keep the sources honest. It’s in their interest to sniff out the noise because the broker that uncovers the truth gets to break the story. And yes, there may be collusion between sources and brokers, but that’s why it’s important to have a broad field of independent brokers to uncover any injustice in the system. No system is perfect.

A more accurate depiction of our distributed network of information publishing probably looks more like this:


We are all interconnected in one way or another yet we maintain a deafness to the majority of information. We have internal filtering mechanisms and if you compound that with broker based news publishing along with broker filtering and transforming, it is no wonder so many nodes on our network are misinformed. And with so many nodes getting data from misinformed nodes, the issue becomes cascading to the point where some nodes do not trust any of the data from brokers or original sources.

The Solution

Let’s circle back to the messaging analogy temporarily. Like I said, I spend a lot of time helping users diagnose and fix lost messages in a distributed messaging system. Many times users do not even know it’s a loss problem; they just know they have applications generating bad data and do not realize why. To fix any messaging problem, one must address the loss first and foremost, and there are generally two simple solutions:

  1. Slow down the sources / subscribe to fewer sources
  2. Speed up message processing by moving unprocessed messages to a queue to be processed later.

The same principles apply to information gathering for us. We cannot possibly keep up with all the news and information being published on a daily basis, it is just not physically possible. Therefore we are oftentimes left with incomplete information because we get broken up information from multiple sources or we get filtered data from a single source.

If a topic interests you, rather than just subscribing to a single broker, or a particular subset of brokers, subscribe to all of them on that particular topic. That is still going to be a lot of data, however if you queue it and take your time processing it you will be a better informed node in the system. Do not stop at the most recent messages on the topic either – go back and retrieve the history of that topic. The more data you can gather and process, and then share with your surrounding nodes the better off we’ll all be. Imagine a network where every node was well informed on a specific topic and only shared what it knew about that topic?

Obviously we do not live on that world. Instead we live in a world where few of us are experts on topics where we have strong opinions on said topic. There is nothing wrong with that as long as we understand that fact that our viewpoint is incomplete and lacking data. The smartest thing we can say in this case is “I don’t know.” Too few of us do not know that we do not know.   

It’s a lot to ask, to maintain due diligence before passing judgement on a talking point that people are passionate about. We have busy lives with family, jobs, social obligations, and even trying to maintain our own mental and physical status; it’s hard to find the time to be open minded. However, if we do, if we take the time to consider alternatives, to research, to not share mindless information that intentionally provokes, we would all be better off.

Walter Isaacson’s Benjamin Franklin

I took a break from the incredibly short Economics in One Lesson to read Walter Isaacson’s epic biography Benjamin Franklin: An American Life, and it was well worth it. I’m not done talking about Economics in One Lesson, not by any means, but for some reason I had the urge to finally read the Benjamin Franklin biography, and since it has been sitting idle on my bookshelf for the better part of 3 years, I figured it was time.

I had read Isaacson’s work on Steve Jobs and Albert Einstein, and since I enjoyed those I always knew I would likely enjoy his take on the life of Benjamin Franklin, I just never knew I would enjoy as much as I did. I’m not a particularly fast reader, and since his other biographies had taken me multiple weeks to get through, I was never ready to commit myself to the Franklin edition. Well, I was hooked and plowed through it in about 1 week.

For anyone interested in modern day politics, particularly those in the United States, it is a great idea to go back and read about the founding fathers of America. Often times, when zealots and pundits are trying to make a point, they will often quote a founding father as if it is the gospel in which this country was formed, and therefore cannot be argued with. The reality, however, is that America’s founding fathers were equally divisive as politicians are today. It’s almost a miracle that America, post-Revolutionary War, was able to draw up our most sacred Constitution.

For me, my knowledge of Franklin was limited to what I learned early in my school days; the kite and his discovery of electricity in the clouds, the first public library, and later his work on creating the country’s founding documents. What I did not know about was his earlier days as a printer, author, and newspaper editor and the influence he was able to wield from this work. I also did not fully understand or appreciate Franklin’s involvement prior to the Revolutionary War as America’s first diplomat in London, spending most of his time trying to prevent a conflict. Then making a short trip back to Philadelphia to assist Thomas Jefferson in writing the Declaration of Independence, only to head back to Europe, France, to continue his work as a diplomat in securing America’s first alliance. After the war, and the icing on the cake, Franklin’s work during the Constitutional Convention is nothing short of brilliance that has likely not been seen since.

If you like American history, or if you just like interesting people who, against extraordinary odds are able to succeed at so much, read this book.

The Paradox of Henry Hazlitt in Economics in One Lesson

I’m about half way through Economics in One Lesson, and while I appreciate the simplicity of libertarian economics and philosophies, I have come to the realization of the paradox it is in and of itself. At a very high level, the economic philosophy here is that a dollar not spent on product A is spent on product B, which is economically equivalent because those dollars spent on B have equal impact to the industry in which B is produced (employment, wages, production, etc.). However, a dollar collected by government via taxes is not equal, because governments do not spend money as efficiently as the private sector. So at its core, the argument is that the dollar is not always a dollar.

But lets think about this a little more. Where does the dollar spent on private goods end up? It ends up in someone else’s pocket, either via profit, wages, or even taxes. That dollar is eventually spent again in a similar manner, on more goods or services and round and round it goes.

What about the dollar not spent via the private sector and instead collected as a tax? That dollar, he argues, might be spent on wasteful projects, like a bridge that is not needed. While the bridge might be a waste of time, those dollars are still spent on goods and services, which end up in people’s pockets, and go round and round the same economic circle as the dollar spent in the private sector.

Yes, the bridge may equate to little or no productive use, but so does countless private sector spending; wasted on R&D projects for products never to see the market, employee boondoggles, unnecessary upper management, etc. (Did Hazlitt never work for a public company flush with cash?) Again, these are things of little productive value, but the dollars spent still end up in the same vacuum where they can be re-used (unless someones saves the dollar along the way, which is never really addressed in the book so far).

The argument is that a dollar is a dollar when spent in the private sector, but a dollar is not a dollar when it is taxed and spent by government. I’m not an economist by any means, but would love some more interpration on this particular subject.


I have finished the book and overall I appreciate the theory but remain skeptical. Hazlitt does briefly talk about cash saved and says this is baked into the economy and has no real impact. This seems feasible to me. Hazlitt’s theory on trade is also thought provoking and worth the read. Hazlitt states that free trade increases production across the board because it keeps the competition “honest” versus if imported products were being taxed. He also states that any money that goes out of the country for imported goods must come back since that money is only good in the country it originated from. Not sure how productive foreign transactions fees are to an economy, but I suppose there is some truth in this as well.

I am still left very skeptical of the core argument that government tax dollars are ineffectively spent. I would not go so far as to say that the government is perfect when it comes to spending money (far from it), but neither are people or companies outside of the government. And if I waste a dollar on something unproductive, who is to say that dollar won’t be put to productive use in its next life?

Time to Create a Personal Blog

This is something I’ve been meaning to do for a long time. I work on lots of different things, some of them interesting and worth sharing, so what better platform than proceemo.com! On the surface, proceemo is just a little price notifying service I created a while back, primarily for my wife. After a couple years, I thought it would be worth opening it up for the rest of the world, so that’s exactly what I did. Now anyone can sign up and use it, completely free. Yes, there is referral stuff built in so technically I could make some money if enough people bought stuff they get notifications for. But realistically, the volume and commissions are so small they don’t even cover my yearly server and domain costs.

Under the surface though, proceemo is much more. It’s simply a place where I write lots of code, some of it worth sharing, most of it not. At the moment I’m playing with stock trade data, building dynamic portfolios based on specific financial criteria. In the process I created a a pretty decent framework and database to access and store financial data. It is by no means a commercial product, but I’m happy to share the work done so far (proceemo.com/market).

Other than proceemo, my day job is in messaging, so I’m happy to share my experiences with that (when I can). My public work can be viewed on GitHub here: https://github.com/ultramessaging

I also make beer, and I share my recipes here: http://glassbrews.tumblr.com/

And you can find me on Twitter here: https://twitter.com/SteveMGlass – I don’t tweet often, but whatever.

I’m also on LinkedIn: https://www.linkedin.com/in/glass

That’s it for an intro. More to come in a bit.