-
Notifications
You must be signed in to change notification settings - Fork 870
Insights: jingyaogong/minimind
Overview
-
- 0 Merged pull requests
- 1 Open pull request
- 31 Closed issues
- 5 New issues
Could not load contribution data
Please try again later
1 Pull request opened by 1 person
-
feat: 优化导入和代码风格
#162 opened
Feb 11, 2025
31 Issues closed by 2 people
-
数据集链接失效
#160 closed
Feb 11, 2025 -
关于:1-pretrain.py与3-full-sft.py中的train_epoch
#126 closed
Feb 10, 2025 -
Is it possible to train for a code model?
#158 closed
Feb 10, 2025 -
mac book pro m1能用吗
#127 closed
Feb 10, 2025 -
考虑使用MLA吗
#133 closed
Feb 10, 2025 -
参数量大的模型会爆显存吗?有必要用megatron吗?
#157 closed
Feb 10, 2025 -
考虑加入蒙特卡洛搜索吗
#131 closed
Feb 10, 2025 -
您好,我想了解下Seq-Monkey通用文本数据集的构建
#128 closed
Feb 10, 2025 -
如何训练一个垂直领域的模型
#122 closed
Feb 10, 2025 -
考虑加一个Reasoning模型么?
#125 closed
Feb 10, 2025 -
Is it possible to update the model to DeepSeek-V3 architecture?
#120 closed
Feb 10, 2025 -
关于 MoE 版本的辅助损失函数.
#110 closed
Feb 10, 2025 -
关于在MAC上执行的一些发现,非issue
#55 closed
Feb 10, 2025 -
想问一下作者大大是怎么做到模型大小这么小的,通过模型压缩或是参数共享吗,还是只是减少了layer、dim这些参数呢?
#105 closed
Feb 10, 2025 -
讨论个人GPU的训练时间
#60 closed
Feb 10, 2025 -
使用google colab方式训练测试笔记本(不是issue)
#50 closed
Feb 10, 2025 -
关于“”更高效的kvcache“”的疑问
#156 closed
Feb 10, 2025 -
MoE专家的aux loss好像未被使用
#140 closed
Feb 10, 2025 -
tokenizer解码再编码之后和原文不一样
#135 closed
Feb 10, 2025 -
why use BOS and EOS token and not only the EOS?
#130 closed
Feb 10, 2025 -
moe共享专家
#129 closed
Feb 10, 2025 -
部分网盘链接失效
#124 closed
Feb 10, 2025 -
这个只能训练中文Model吗,英文model可以训练吗,求大佬讲解
#119 closed
Feb 10, 2025 -
训练时,输入的token会补全到最大长度.那是不是意味着attention计算时,是按照满token的时间复杂度计算的
#118 closed
Feb 10, 2025 -
关于保存模型大小的问题
#116 closed
Feb 10, 2025 -
使用DPP进行多卡训练报错
#115 closed
Feb 10, 2025 -
新手请教一下大佬,我现在想让大模型完全记忆一个代码仓库的代码,我是放在预训练阶段呢呢,还是放在SFT阶段呢,或者其他阶段呢?
#117 closed
Feb 10, 2025 -
Githubpage.io ✖️✖️
#113 closed
Feb 10, 2025 -
关于 tokenizer.json 词汇表中没有中文的问题
#111 closed
Feb 10, 2025 -
dpo强化学习报错'generator' object has no attribute 'generate'
#141 closed
Feb 10, 2025 -
麻烦帮忙解答一下MoEFFN中训练和推理的代码底层逻辑有何不同
#139 closed
Feb 9, 2025
5 Issues opened by 5 people
-
为什么训练过程中有差不多50%是硬件是闲置的
#164 opened
Feb 12, 2025 -
数据增强(TinyStories)+自我安利
#163 opened
Feb 11, 2025 -
关于多阶段 SFT
#161 opened
Feb 11, 2025 -
如何切换训练的模型呢
#159 opened
Feb 10, 2025 -
MiniMind2 is coming soon
#142 opened
Feb 6, 2025
4 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
学习代码的时候写了一个教程,希望可以帮到其他同学
#73 commented on
Feb 9, 2025 • 0 new comments -
长文本能力问题
#112 commented on
Feb 10, 2025 • 0 new comments -
不是Issue,一点个人训练minimind的记录
#26 commented on
Feb 10, 2025 • 0 new comments -
这。。。这不是我的梦中情库嘛 求大佬建个微信群~
#9 commented on
Feb 12, 2025 • 0 new comments