Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
TOMDM
50 days ago
|
parent
|
context
|
favorite
| on:
Accelerating Gemma 4: faster inference with multi-...
As long as you're not bound on parallelism or bandwidth then it's "free", but if you're constrained on either resource then your lighter predictor model just needs to save you more cycles than it congests on average.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: