Inside Dataddo's Technology Infrastructure

By Tom Sedlacek | 4 min read

At Dataddo, our technology infrastructure—the services we use, how we structure them, and our plans for the future—is just as pertinent to the quality of our customer’s data as it is to the health of Dataddo as a company. As much as Dataddo has grown in the past several years, we are expecting exponential growth in the coming years in terms of customer base and data volume, and it is paramount that we focus our efforts on crafting a tech infrastructure to support these changes. The following description tracks our infrastructure from the beginning through to current efforts and plans for the future. 

From PHP Monolith to Specialized Microservices

In its very early days, Dataddo was a monolithic PHP application, hosted on an AWS server. 

Necessity quickly pushed us to add microservices to diversify and strengthen our processing power, which turned our basic application into an orchestrated web of moving parts. All of the following microservices are written in Go language:  

  • Extractor: pulls data from all the third-party services we connect to, such as Facebook Ads or Google Analytics. 
  • Writer: writes data to customers’ databases, data warehouses, and other destinations. 
  • Storage: stores data between extraction and writing as a smart cache, running on S3. 
  • Data Provider Internal (DPI): performs data transformations within Dataddo. 
  • Data Provider External (DPE): provides the API that allows dashboarding services to extract data from Dataddo itself. 
  • Dispatcher: schedules and executes actions for the other microservices, such as extracting and writing according to customer’s settings.
  • Additional services: Notificator, Statistics, Logman, Detector 

3 Vertical Layers of Coded Infrastructure

The latest push of our technological infrastructure is to transition completely to an Infrastructure as Code (IaC) principle. The applications we principally use to accomplish this are Terraform and Kubernetes, with support from Git to track all code changes, and Grafana, Kibana, and Argo CD for visualization. We can think of this architecture as being separated into three vertical layers. 

The base layer contains our actual coded infrastructure defined in HCL .tf files. This code describes every piece of the Dataddo platform as well as the configuration of our AWS servers. Before moving to IaC, it was necessary to manually configure all of our AWS usage. Now, Terraform automatically propagates code changes to AWS without the need to connect to it personally, and ensures that our AWS instances match what we have defined in configuration .tf files. 

As we make changes to the underlying code, Git lets us track who made changes, what changes, and when, so that we can consistently measure the effectiveness of our adjustments. 

The second layer of the infrastructure primarily contains Kubernetes, which we use for load balancing, balancing server instances, interconnecting our applications, and automatically healing downed services. We use Helm, the templating manager for Kubernetes, for our Kubernetes items configuration.

On top of everything is our visualization layer, for which we use Argo CD and Grafana as graphical interfaces to monitor our systems in an easier, more human way. Both monitor the operation of our services in real-time and lets us view operational logs to ensure everything functions as it should. Grafana moreover tracks each layer of the infrastructure and reports how many resources a particular application needs/is using, how it uses those resources in real time, which are our peak usage times, etc.  

Upcoming Changes: Auto-scaling, Multiple Environments

The IaC described above is the result of years of trial and error. We are still making continuous improvements as we create an infrastructure that utilizes our developers’ strengths more efficiently. 

We are already able to automatically scale up and down our Kubernetes nodes and microservice instances according to actual customer usage, ultimately saving developer time and company resources, while delivering a better experience to our platform users. We will continue in improving our infrastructure automation in the future. In addition, we’re introducing testing and development environments alongside our production environment for cleaner deployments with fewer bugs. 

Takeaway

Fundamentally, our IaC framework is rapidly moving us towards one reality: where everything is automated, everything is measured, everything is logged, and everything is monitored. With this continuous data collection to understand the why behind errors and issues, our developer team is well-equipped to correct errors, improve system function, prevent future errors, and ultimately invest their energy into solving more interesting problems as Dataddo continues to grow. 


Category: Product, Inside Dataddo, tips-tricks

Comments