Sauron是一个使用C/C++、Java开发的分布式系统,包括100TB数据量、8个物理机机组、30虚拟机/容器/进程/边缘设备集群、百亿规模的分布式爬虫、分布式查询系统,实现多个维度Token(音乐、电影、学术等10项)的大数据爬虫与查询引擎(类似小搜索引擎)。
Sauron is a distributed system developed using C/C++ and Java. It consists of a 100TB data set, 8 physical machine clusters, and 30 clusters of virtual machines, containers, processes, and edge devices. It encompasses a large-scale distributed crawler and a distributed query system, capable of handling billions of records. The system is designed to perform big data crawling and querying for multiple dimensions, including 10 different domains such as music, movies, and academics, functioning as a search engine-like system....