← Back to Product Feed

Hacker News Show HN: sllm – Split a GPU node with other developers, unlimited tokens

Enables developers to share dedicated GPU nodes for LLM inference, offering cost-effective access to large models (e.g., DeepSeek V3) at low token rates (15-25 tok/s) with complete privacy and an OpenAI-compatible API.

132
Traction Score
66
Discussions
Apr 4, 2026
Launch Date
View Origin Link

Product Positioning & Context

AI Executive Synthesis
Enables developers to share dedicated GPU nodes for LLM inference, offering cost-effective access to large models (e.g., DeepSeek V3) at low token rates (15-25 tok/s) with complete privacy and an OpenAI-compatible API.
sllm addresses a significant economic barrier for developers and small teams: the prohibitive cost of dedicated high-end GPUs for large LLM inference. By enabling shared access to powerful hardware (e.g., 8xH100 GPUs for $14k/month models) at a fraction of the cost, it democratizes access to advanced AI capabilities. The "cohort" model and "pay-only-when-full" mechanism reduce financial risk for users. Crucially, the OpenAI-compatible API and vLLM integration simplify adoption, allowing seamless integration into existing workflows. The emphasis on complete privacy (no traffic logging) directly tackles a major enterprise concern. This service represents a compelling solution for cost-effective, private, and scalable LLM inference, critical for broader AI development and deployment.
Running DeepSeek V3 (685B) requires 8×H100 GPUs which is about $14k/month. Most developers only need 15-25 tok/s. sllm lets you join a cohort of developers sharing a dedicated node. You reserve a spot with your card, and nobody is charged until the cohort fills. Prices start at $5/mo for smaller models.The LLMs are completely private (we don't log any traffic).The API is OpenAI-compatible (we run vLLM), so you just swap the base URL. Currently offering a few models.
GPU node DeepSeek V3 (685B) 8×H100 GPUs tok/s cohort of developers dedicated node LLMs are completely private don't log any traffic

Related Ecosystem & Alternatives

Discover adjacent products, open-source repositories, and developer tools sharing similar technical architecture.

Deep-Dive FAQs

What is sllm – Split a GPU node with other developers, unlimited tokens?
sllm – Split a GPU node with other developers, unlimited tokens is analyzed by our AI as: Enables developers to share dedicated GPU nodes for LLM inference, offering cost-effective access to large models (e.g., DeepSeek V3) at low token rates (15-25 tok/s) with complete privacy and an OpenAI-compatible API.. It focuses on sllm addresses a significant economic barrier for developers and small teams: the prohibitive cost of dedicated high-end GPUs for large LLM inferen...
Where did sllm – Split a GPU node with other developers, unlimited tokens originate?
Data for sllm – Split a GPU node with other developers, unlimited tokens was aggregated directly from the Hacker News community ecosystem, representing raw developer and early-adopter sentiment.
When was sllm – Split a GPU node with other developers, unlimited tokens publicly launched?
The initial public indexing or launch date for sllm – Split a GPU node with other developers, unlimited tokens within our tracked developer communities was recorded on April 4, 2026.
How popular is sllm – Split a GPU node with other developers, unlimited tokens?
sllm – Split a GPU node with other developers, unlimited tokens has achieved measurable traction, logging over 132 traction score and facilitating 66 recorded discussions or engagements.
Which technical categories define sllm – Split a GPU node with other developers, unlimited tokens?
Based on metadata extraction, sllm – Split a GPU node with other developers, unlimited tokens is categorized under topics such as: GPU node, DeepSeek V3 (685B), 8×H100 GPUs, tok/s.
What are some commercial alternatives to sllm – Split a GPU node with other developers, unlimited tokens?
Our semantic intelligence engine identifies potential commercial alternatives in the SaaS space, such as Databerry, which offers overlapping value propositions.
How does the creator describe sllm – Split a GPU node with other developers, unlimited tokens?
The original author or development team describes the product as follows: "Running DeepSeek V3 (685B) requires 8×H100 GPUs which is about $14k/month. Most developers only need 15-25 tok/s. sllm lets you join a cohort of developers sharing a dedicated node. You reserve a s..."

Community Voice & Feedback

spencer9714 • Apr 4, 2026
Interesting concept. One thing I’m curious about if I’m in a cohort for something like DeepSeek V3 and another user spins up a heavy 24/7 job, how do you keep TTFT from degrading? vLLM’s continuous batching helps, but there’s still a physical limit with shared VRAM/compute. I’ve been grappling with this exact 'noisy neighbor' issue while building Runfra. We actually ended up moving toward a credit per task model on idle GPUs specifically to avoid that resource contention entirely.Curious how you’re thinking about isolation here. Is there any hard guarantee on a 'slice' of the GPU, or is it mostly just handled by the vLLM scheduler?
avereveard • Apr 4, 2026
Interesting there's a trickle of low intensity job one can always get running but like glm own plan is $30/mo and something about 300tps now I know that one is subsidized but still.
tensor-fusion • Apr 4, 2026
Interesting direction. One adjacent pattern we've been working on is a bit less about partitioning a shared node for more tokens, and more about letting developers keep a local workflow while attaching to an existing remote GPU via a share link / CLI / VS Code path. In labs and small teams we've found the pain is often not just allocation, but getting access into the everyday workflow without moving code + environment into a full remote VM flow. Curious whether your users mostly want higher GPU utilization, or whether they also want workflow portability from laptops and homelabs. I'm involved with GPUGo / TensorFusion, so that's the lens I'm looking through.
p_m_c • Apr 4, 2026
Do you own the GPUs or are you multiplexing on a 3rd party GPU cloud?
QuantumNomad_ • Apr 4, 2026
> How does billing work?> When you join a cohort, your card is saved but not charged until the cohort fills. Stripe holds your card information — we never store it. Once the cohort fills, you are charged and receive an API key for the duration of the cohort.Have any cohorts filled yet?I’m interested in joining one, but only if it’s reasonable to assume that the cohort will be full within the next 7 days or so. (Especially because in a little over a week I’m attending an LLM-centered hackathon where we can either use AWS LLM credits provided by the organizer, or we can use providers of our own choosing, and I’d rather use either yours or my own hardware running vLLM than the LLM offerings and APIs from AWS.)I’d be pretty annoyed if I join a cohort and then it takes like 3 months before the cohort has filled and I can begin to use it. By then I will probably have forgotten all about it and not have time to make use of the API key I am paying you for.
varunr89 • Apr 4, 2026
$40/mo for deepseek r1 seems steep compared to a pro sub on open ai /claude unless you run 24x7. im not sure how sharing is making this affirdable.
freedomben • Apr 4, 2026
This is an excellent idea, but I worry about fairness during resource contention. I don't often need queries, but when I do it's often big and long. I wouldn't want to eat up the whole system when other users need it, but I also would want to have the cluster when I need it. How do you address a case like this?
kaoD • Apr 4, 2026
How is the time sharing handled? I assume if I submit a unit of work it will load to VRAM and then run (sharing time? how many work units can run in parallel?)How large is a full context window in MiB and how long does it take to load the buffer? I.e. how many seconds should I expect my worst case wait time to take until I get my first token?
vova_hn2 • Apr 4, 2026
1. Is the given tok/s estimate for the total node throughput, or is it what you can realistically expect to get? Or is it the worst case scenario throughput if everyone starts to use it simultaneously?2. What if I try to hog all resources of a node by running some large data processing and making multiple queries in parallel? What if I try to resell the access by charging per token?Edit: sorry if this comment sounds overly critical. I think that pooling money with other developers to collectively rent a server for LLM inference is a really cool idea. I also thought about it, but haven't found a satisfactory answer to my question number 2, so I decided that it is infeasible in practice.
mmargenot • Apr 4, 2026
This is a great idea! I saw a similar (inverse) idea the other day for pooling compute (https://github.com/michaelneale/mesh-llm). What are you doing for compute in the backend? Are you locked into a cohort from month to month?

Discovery Source

Hacker News Hacker News

Aggregated via automated community intelligence tracking.

Tech Stack Dependencies

No direct open-source NPM package mentions detected in the product documentation.

Media Tractions & Mentions

No mainstream media stories specifically mentioning this product name have been intercepted yet.

Deep Research & Science

No direct peer-reviewed scientific literature matched with this product's architecture.