MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
Ip2region (2.0 - xdb) is a offline IP address manager framework and locator, support billions of data segments, ten microsecond searching performance. xdb engine implementation for many programming…
Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
terashuf shuffles multi-terabyte text files using limited memory
Open-source Android/Desktop remake of Civ V
🗺 MapSCII is a Braille & ASCII world map renderer for your console - enter => telnet <= on Mac (brew install telnet) and Linux, connect with PuTTY on Windows
A reference use of Hashicorp's Raft implementation
Production-Grade Container Scheduling and Management
🕹️ A basic gameboy emulator with terminal "Cloud Gaming" support
Convert your ascii diagram scribbles into happy little SVG
A toolkit with common assertions and mocks that plays nicely with the standard library
📚 Collaborative cheatsheets for console commands
Docker registry v2 command line client and repo listing generator with security checks.
MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.
深度学习500问,以问答形式对常用的概率知识、线性代数、机器学习、深度学习、计算机视觉等热点问题进行阐述,以帮助自己及有需要的读者。 全书分为18个章节,50余万字。由于水平有限,书中不妥之处恳请广大读者批评指正。 未完待续............ 如有意合作,联系[email protected] 版权所有,违权必究 Tan 2018.06
Syntax highlighting for thrift definition files.
Yet Another System Region and Language Simulator
Company names matching: match company names to legal names and stock symbols