March 7 at 1 p.m. MST, Data Bottlenecks & Outages AMA


There have been lots of data bottlenecks and outages lately. In this Ask Me Anything, ask us your questions about what is going on behind the hood at Sovrn and elsewhere. We’ll explain the events, reasons behind the issues and what’s happening on our end to mitigate this issues for the future.

The other hot topic this AMA covers is how digital advertising companies take advantage of cloud hosting while managing risks to your business.

Online to answer your questions will be:
Brian Stockmoe VP, Platform Engineering Sovrn
Mick Bass, CEO, 47 Lining, AWS Partner

This will be the live thread for questions. If you cannot attend at the time of the AMA, please post below and we will answer at the time of the AMA.

Note: To post a question, you must sign up for the community.


Learn more about Sovrn’s Data Reporting and the Challenges of Scaling Growth here:


Hi- This question was sent via one of our publishers: “Just curious what the issue is, not really upset and wanting a make-good explanation, just curious the actual issue since we appreciate the challenges header bidding places on all parts of the eco system and want to be aware and sympathetic to the technicals.”


Hi – I am Brian Stockmoe VP, sovrn Platform Engineering – great to see interest in the AMA for data bottlenecks, outages, and cloud risk management topics given recent business impacts of cloud outages.

A little of my background: Joined sovrn in September to spearhead our platform initiative. The sovrn team has done a great job growing the company and platform over the past years. As we continue to grow we decided to create a platform initiative to focus on platform strategy aligned to astronomical company success and growth.

In past lives, I have worked on digital platforms for digital health, media supply chain and distribution, bond trading exchange, eCommerce, and B2B capital markets.

I am really excited about opportunities for digital advertising and look forward to answering your questions and networking so we can partner on solving platform challenges.


I’m Mick Bass, CEO of 47Lining. 47Lining is an Amazon Web Services Advanced Consulting Partner with Big Data Competency designation. We develop big data solutions and deliver big data managed services built from underlying AWS building blocks like Amazon Redshift, Kinesis, S3, DynamoDB, Machine Learning and Elastic MapReduce. We help customers build, operate and manage breathtaking “Data Machines” for their data-driven businesses. We architect solutions that address traditional data warehousing, Internet-of-Things analytics back-ends, predictive analytics and machine learning to open up new business opportunities. Our experience spans use cases in multiple industries including energy, life sciences, gaming, retail analytics, financial services and media & entertainment.

I’m an AWS Certified Solutions Architect, Professional and enjoy working with customers like sovrn to bridge business requirements with technology solutions. I’ve been working closely with the sovrn team since mid 2016. I’d like to thank Brian for including us in this AMA, and look forward to the discussion over the next hour or so.


Thanks for the great question !

Header Bidding greatly increased the volume of bids all exchanges receive - a good thing for our publishers !

In our case the volume of bids went up 8X. For every bid sovrn pulls the bid details back into our “data lake” a consolidated database where we collect all raw data for the exchange.

From there we process the data in real time to inform our real time bid pricing algorithms. We also process the data in real time for our //meridian publisher dashboards - we believe providing real time publisher analytics is best in class !

We also process the data in batches to run various reports, and to supply our data solutions line of business.

In this case, the amount of data coming into the “data lake” created greater “demand” than our infrastructure capacity could “supply”. The effect is the data needed to deliver real time dashboards and batch reporting was delayed.

It is important to know, we had ZERO data loss because we keep multiple copies of the data.

To recover, we reconfigured the infrastructure to add capacity, and changed software settings so it could keep up again.

We are also rewriting the software that does real time processing to be able to scale “horizontally” ie by adding more servers as we grow.


Another question we received is why data outages are happening now across many of the data aggregators - ad tech and other industries. Could you both please shed light on this occurrence?


The ability to provision “Elastic, On-Demand” capacity is critical to coping with the increases in data volumes associated with new capabilities like header bidding. This has been a major focus of our work with sovrn and enabling ad tech workloads in the Cloud.


Many data aggregators and ad tech companies are struggling with similar issues to sovrn.

That is, as supply for programmatic advertising grows, the amount of raw data collected also grows exponentially.

In turn, the architectures and infrastructure that process the bids and data which were designed for much lower volumes, are not able to process the peak volumes now being seen across the industry.

Therefore, the failures that were not impactful in the past are now magnified. Examples are major delays for ingest of raw data into the “data lake” ie common database, and, major delays in being able to process the data in real time due to lack of processing capacity.

Also the effort to recover becomes much more complex because the platform must continue to service the current volume while “backfilling” or catching up on processing for delayed data.

A good metaphor is the old Lucy skit with the conveyor belt of candy moving faster than she can pack the boxes. In our case the candy is the data, and the boxes are the dashboards and reports. In order to catch-up we have to add more workers ie capacity to keep up with current data as well as process the backlog of data at the same time.

There are two platform architectures companies typical use: one is dedicated servers in a fixed set of data centers; two is the use of “elastic cloud” where capacity is virtualized and depends on the cloud provider for base services such as virtual servers, storage, and data processing software.

In the case of dedicated servers companies are required to make capital investments to build out dedicated infrastructure capacity.

In case of cloud, companies are taking advantage of “just in time” capacity through automation to scale capacity.

sovrn is taking a hybrid approach meaning we use both dedicated data centers and cloud. We are also architecting our software to take advantage of both options based on the workload requirements ie running workload where it is best served.

There are risks to both models.

In the case of dedicated data centers, there are constraints the amount of physical capacity and lead times to build-out can be longer. This option typically requires some level of a base investment, and can result in idle capacity during non-peak times.

In the case of cloud, there is an implied agreement that any cloud service can be unavailable at any time. This option requires variable expense ie pay for what you use and requires careful expense forecasting and controls.

In the case of the AWS outage in the Eastern Region which impacted many publishers and demand partners last week the storage service was unavailable. This caused major sites to be to go offline because they could not connect to their data.

Although not frequent, this scenario can occur in the cloud AND dedicated data centers.

There are common reliability patterns that can be applied in both dedicated data centers and cloud hosting to ensure data is "offloaded’ to geo-diverse hosting locations, and that you can automatically move processing to a secondary hosting location to avoid impact to your business.

Think of these options as your insurance policy and like other insurance the costs are based on probability of failure & cost of failure ie opportunity, brand, and revenue.

In case of dedicated data centers, this involves working with your provider to setup data replication between data centers, and deciding what level of processing you want to pay for in the second site. You can make choices about how far apart you want the data centers, and whether you want “hot”, “warm”, or “cold” processing ie immediate recovery, intermediate recovery, extended recovery.

When hosting in the cloud your choices involve choosing which secondary Regions ie geographic location, you want your site to run in if the primary becomes unavailable to avoid your site going down.

In the cloud you also have choices to decide with “Zones” ie data centers within a region, you want your site to use if a specific service is unavailable in one zone but not others.

Ultimately, the goal is to ensure high reliability at massive volumes, and ensure no single point of failure takes your business down !


Indeed, because individual component failures in distributed systems are invevitable - wherever they are hosted - it is important to architect these systems to included redundancy so they continue to operate seamlessly whenever a component becomes unavailable. Cloud providers provide several “out of the box” features that make this relatively straightforward - and it is important to adopt and use those features in situations where uptime and availability are critical.

Availability and uptime of Cloud services is very good (typically > 99.99%) - while failures are rare, it is critically important to architect for redundancy so systems remain available even if a particular service or component is not available.


Hopefully, the information shared is helpful.

Our belief at sovrn is the more we raise common challenges the better the solutions are we can provide to our publishers.

What challenges are you as publishers having, and how can we connect to partner on solutions ?


Two other key approaches to dealing with the “data explosion” faced by Ad Tech are separation of compute from storage, and teasing apart distinct workloads.

Separation of compute from storage directs data into a distinct “object storage” subsystem, that in turn provides the data to multiple compute workloads that can scale independently from the volume of data stored.

Teasing apart distinct workloads involves separating the processing required for each part of the data architecture that Brian described. This allows each part of the flow to scale independently from the others, allowing incremental capacity to be easily added as data volumes increase.


Thank you Mick and Brian for sharing why data outages and bottlenecks are occurring, and what actions are happening to alleviate this issue.

For those of you who still have questions, please ask here and Brian and Mick will respond.