Decentralize Infrastructure Knowledge through Analytical Infrastructure Encyclopedia

Timothy Agustian
Tokopedia Engineering
4 min readSep 8, 2023

--

Background

Imagine your company has many cloud resources and collaborates with multiple cloud service providers. Thousand of resources consist of multiple types of resources distributed unevenly across multiple cloud service providers and you, as the infrastructure engineer need to decentralize this management to make sure that every software engineer has enough visibility to do anything but not too much as many of these also contain confidential information. What should you do?

Introducing the Infrastructure Encyclopedia

Analytical Infrastructure Encyclopedia or Let’s just call it Infrastructure Encyclopedia will help to unify all of the information into one single place and will help achieve the ultimate decentralization of Infrastructure Information knowledge.

Sources and Integrated Data

Some of the examples of sources and data that are usually integrated into the encyclopedia:

  1. Cloud Services Provider
    List of Infrastructures, Machine Specification, Storage Specification, Billing, tag, Infrastructure alert and event, etc
  2. Observability Platform
    Infrastructure metrics such as CPU, Memory, and disk usage and any other custom metric as well if needed such as API total request, API latency, etc
  3. Other Platform
    Another platform that manages Engineer hierarchy, ownership, and any other platform that might elevate the usage of the encyclopedia

Layer Explanation

The Encyclopedia is separated into 3 layers which have their functioning purposes:

  1. Integration
    Which will be oriented toward integrating all of the data above across multiple sources. I think this is easily achievable by using an application service that connects with API through all of the sources. The data can be stored in the data warehouse, or maybe another analytical data store.
  2. Process
    Integrated data will be processed internally to make sure that everything is already been processed before moving to the presentation layer.
  3. Presentation
    Processed data then will be displayed into any kind of result, be it a sheet report, business insight tool such as Looker Studio, Alert, or used in another application as well

End Result

  1. Infrastructure Resources Utilization Recommendation & Budgeting

As the encyclopedia already has these data:

  • List of resources and billing from cloud services provider
  • Utilization usage such as CPU, Memory, Disk, etc. in the observability platform
  • Ownership and team hierarchy from another platform

Creating a recommendation system that can give visibility to all of the engineers based on the owned resources to accelerate optimization action is a doable task alongside the visibility of the expense of each resource.
But Don’t forget to implement Role Based Access Control (RBAC) as this information is quite confidential if not targetted correctly.

2. Infrastructure Resources at Risk Monitor

Some observability tools already have an agent that pushes all of the information including other installed agent versions, OS versions, etc. Combining with the similar data above, we can construct a dashboard that can show any kind of software that already reaching its End Of Life or is already at risk, making a decentralized patching much easier

3. Infrastructure Resources Summary

As we have many types of instances across cloud service providers, having visibility for all of the resources makes it easier to make any decision based on the data that we have.

And many more use cases that can be elevated through this encyclopedia

In Conclusion

If your company has multiple types of resources across multiple providers, Having a way to store all of the necessary information into one single platform will be beneficial as it will accelerate any kind of engineering decision-making and action. Infrastructure Encyclopedia works as a foundation for solving many complex problems that utilize infrastructure information.

In Tokopedia, we are implementing multiple cloud strategies to get the best from each of the providers to further enhance the reliability and capability of our infrastructure. We utilize the encyclopedia as our stepping stone to decentralize any massive act of engineering, from tech improvement initiatives, patching, Optimization, and many more aspects that we elevate.

--

--