Bean Nuts Hazelnut Sauron
Abstract:
Sauron Eyes for Cthulhu Database, the Cthulhu(Nuts Digital Data Center) which include designed 1PB NLP and structed token-sets, and currently include 100TB dataset from more then 1000 websites and open datasets.
Sauron has many minions and legions, the most important legion including ‘Sauron Nonabyte’, ‘Sauron Eyes’, ‘Sauron LLM of Longinus [Large Logic Models]’ and etc, this legion of Sauron has its destination for understand and profile everything if possible, while its ideal is illumination, enlightment and revelation.
Sauron is a distributed bigdata module with massive crawler and spider system also with massive data-retrieve and data base system for PB-Level query, whichs foundation is base on Cthulhu database of the ‘Nuts Digital Data Center’, constructed with microservice and deep-learning architecture.
Keywords: BigData, LLM, Microservice, DL.
文档地址:https://docs.nutgit.com/docs/hazelnut_sauron_zh_cn
Document: https://docs.nutgit.com/docs/hazelnut_sauron_en
(1) Datasets:
The primary private datasets of the Sauron is the Xenomorph subset of the Cthulhu database, the Xenomorph which focus on NLP entities modeling from more then 1000 websites and the GPT3-Dataset(45 TB), the prime purpose of its data selection is focus on accuracy and wide, which currently include Movie, Music, Game, Book, Academic, News, Medicine, Law, Finance and etc. For example it include more then 10 billion data records, 100 million songs from world wide, academic papers from 1970 to 2023.
(2) Architecture: