Run any Falcon Model at up to 16k context without losing sanity Current Falcon inference speed on consumer GPU: up to 54+ tokens/sec for 7B and 18-25 tokens/sec for 40B 3-6 bit, roughly 38/sec and ...
NVIDIA’s Rubin platform debuts the Vera Rubin GPU with 35 PF training and 50 PF inference at CES 2026, helping teams plan faster production AI.
Xplace is a fast and extensible GPU-accelerated global placement framework developed by the research team supervised by Prof. Evangeline F. Y. Young at The Chinese University of Hong Kong (CUHK). It ...
Nous Research's NousCoder-14B is an open-source coding model landing right in the Claude Code moment
B, an open-source AI coding model trained in four days on Nvidia B200 GPUs, publishing its full reinforcement-learning stack as Claude Code hype underscores the accelerating race to automate software ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results