Openo1开源版o1来了
开源版o1模型挑战OpenAI霸主地位!

刚刚,一个名为Open O1的开源项目发布,宣称要打造一个可以媲美OpenAI O1模型的开源替代方案。 这一消息在AI社区引起广泛关注, Open O1能否成为挑战OpenAI霸主地位的新力量?
效果实测
是骡子是马,得先拉出来溜溜,先上测试!
我用两个经典例子实测了下。
case 1: 草莓单词strawberry 里有几个r?
果然上来就是一番疯狂思考:
<Thought>Alright, I need to determine how many times the letter 'r' appears in the word "strawberry." Let's start by examining the word closely.First, let's look at the entire word: strawberry. The letters in order are S-T-R-A-W-B-E-R-R-Y. Now, focusing solely on the letter 'r,' I'll scan through each position:1. **S**: Not an 'r'.2. **T**: Not an 'r'.3. **R**: This is our first 'r'.4. **A**: Not an 'r'.5. **W**: Not an 'r'.6. **B**: Not an 'r'.7. **E**: Not an 'r'.8. **R**: This is our second 'r'.9. **R**: This is our third 'r'.10. **Y**: Not an 'r'.So, I've identified three instances of the letter 'r' in "strawberry": positions 3, 8, and 9.But wait, could there be any other considerations? Perhaps considering uppercase versus lowercase, but since all letters here are lowercase, that doesn't affect the count. Also, checking for any hidden or less obvious forms of 'r,' like silent 'r's, but in this case, all 'r's are pronounced clearly.Is there a possibility of multiple 'r's being represented differently? For example, could 'rr' be considered as two separate 'r's or as a single entity? In standard English orthography, 'rr' is treated as two separate letters, so both would be counted individually.Additionally, thinking about diacritics or accents, but "strawberry" doesn't have any accents on the 'r's, so that's not a factor here.Let me also consider if there are any exceptions or special cases in the English language where 'r' might not be counted traditionally, but generally, counting each occurrence as one seems straightforward.To double-check, I'll count them again:- Position 3: 'r'- Position 8: 'r'- Position 9: 'r'Yes, that consistently gives me three 'r's.Another way to approach this is by using a different method, such as counting from the end. Starting from the last letter 'Y':1. **Y**: Not an 'r'.2. **R**: This is our third 'r'.3. **R**: This is our second 'r'.4. **E**: Not an 'r'.5. **B**: Not an 'r'.6. **A**: Not an 'r'.7. **W**: Not an 'r'.8. **R**: This is our first 'r'.9. **S**: Not an 'r'.Again, I arrive at three 'r's.Considering all these methods and perspectives, it seems clear that the letter 'r' appears three times in the word "strawberry."</Thought><Output>There are three letters 'r' in the word 'strawberry'.</Output>
并且没有 小气的隐藏了过程,而是大方的展示每一步 ,并且 答对了!

case 2: 9.11 和9.8 谁大?

同样,也是轻松搞定!
算是面试过关。那就继续往下看看o1 是怎么训练出来的👇
从O1风格到O1实力
Open O1项目的目标远不止于模仿O1的风格。项目团队表示,他们通过精心策划的O1风格思维数据,训练了基于LLaMA和Qwen的模型,赋予了较小规模模型增强的长推理和问题解决能力。
不过,项目负责人Ge Zhang也坦言, 目前的版本离真正的O1能力还有不小差距 。他在推文中表示:
"虽然第一版的推理能力还远不及O1,但我们确实看到了生成明确推理链和探索的积极信号。我们将继续提高数据质量并优化训练过程!"
性能对比:喜忧参半
为了证明Open O1的实力,团队公布了与llama3.1-8b-instruct模型在多个基准测试上的零样本性能对比。结果显示,Open O1在大多数测试中表现优异:
GSM8K:Open O1得分85.82, 超过llama3.1-8b-instruct的84.00
MATH:Open O1得分52.88, 大幅领先llama3.1-8b-instruct的47.42
MMLU:Open O1得分70.45, 高于llama3.1-8b-instruct的67.95
ARC-C:Open O1得分86.52, 超过llama3.1-8b-instruct的83.87
BBH:Open O1得分58.43, 远超llama3.1-8b-instruct的53.64
值得注意的是,在Hellaswag测试中, Open O1以67.77的分数略逊于llama3.1-8b-instruct的68.43 。

社区期待与质疑并存
Open O1的出现引发了AI研究者们的热烈讨论。有人对项目的潜力表示期待,也有人对其当前表现提出质疑。
@MrAnderson452分享了使用Open O1 demo的截图,表示:
"在demo中我得到了非常糟糕的结果,即使对于一个8B参数的模型来说也是如此,但我会以极大的兴趣继续关注这个项目。"

@hckinz则指出了系统响应速度的问题:
"看起来不错,但响应时间有点慢🫠🍓"

未来规划
面对社区的反馈,Open O1团队明确了接下来的发展路线。
Ge Zhang表示,项目的下一步是 清理现有数据、发布数据集,并启动一个收集ShareO1数据集(长链思维数据集)的平台 。
他还呼吁开发者和研究者加入项目,为数据收集、开发或研究做出贡献。项目已经在Discord和Slack上建立了社区,欢迎感兴趣的人士参与。
开源机遇与挑战
Open O1的出现无疑为开源AI领域注入了新的活力。然而,要真正挑战OpenAI O1的地位,Open O1还有很长的路要走。
正如项目团队所承认的, 当前版本主要展现了O1式的推理特征和广泛的搜索思维能力,距离完全实现O1能力还有很大进展空间 。

尽管如此,Open O1项目的雄心壮志值得赞赏。它不仅致力于提供强大的开源替代方案,还希望在测试时扩展方面引领潮流,让先进的AI能力为所有人所用。
而非以open 之名,行close 之事!

那么,你看好Open O1吗?
相关链接
版权声明:
作者:shadowrocket
链接:https://www.shadowrocket8.top/144.html
来源:Shadowrocket官网
文章版权归作者所有,未经允许请勿转载。


共有 0 条评论