anyone here host language / vision / image models locally? if so on what hardware? what frameworks and for what tasks?
Yeah, you can run smaller (~8-20B) LLMs on decently cheap hardware, as well as most image and TTS/voice cloning models, but for the big guns (medium-large LLMs, video gen, 3d model gen, etc.) you need to shell out big money.
If you're strapped for cash, renting a GPU at runpod.io or similar pay-per-instance services are a great way to test things out and figure out what you like. Here's what I recommend based on what I've used:
---------------
[LLMs]
Where to get models: Huggingface: I would ask one of the GPTs which models are the best for your use case and compute constraints, don't get a model just because its on the front page
Where to run models: oobabooga/text-generation-webui: for most workflows (only supports GGUF extension with llama.cpp inference engine, great for text-only workflows, just works)
Open WebUI: used for more cutting edge workflows, experimental community tools, natively multimodal inference support with vLLM inference engine
[Video Gen]
It's been about a year since I've done this, so this might be outdated but almost certainly still works
ComfyUI is a frontend for stable diffusion. When you pick a model from huggingface they usually have a config you can upload with nodes already preconfigured, don't stress yourself out trying to do it yourself. You will regret it and it will not work
I believe vLLM has support for video gen now so you can probably find a frontend for newer/more capable models going this route. Don't quote me
That's pretty much all I've played around with. Just figure out what other people are doing and copy that until you find something that works, then tinker until you're happy with it
If you're strapped for cash, renting a GPU at runpod.io or similar pay-per-instance services are a great way to test things out and figure out what you like. Here's what I recommend based on what I've used:
---------------
[LLMs]
Where to get models: Huggingface: I would ask one of the GPTs which models are the best for your use case and compute constraints, don't get a model just because its on the front page
Where to run models: oobabooga/text-generation-webui: for most workflows (only supports GGUF extension with llama.cpp inference engine, great for text-only workflows, just works)
Open WebUI: used for more cutting edge workflows, experimental community tools, natively multimodal inference support with vLLM inference engine
[Video Gen]
It's been about a year since I've done this, so this might be outdated but almost certainly still works
ComfyUI is a frontend for stable diffusion. When you pick a model from huggingface they usually have a config you can upload with nodes already preconfigured, don't stress yourself out trying to do it yourself. You will regret it and it will not work
I believe vLLM has support for video gen now so you can probably find a frontend for newer/more capable models going this route. Don't quote me
That's pretty much all I've played around with. Just figure out what other people are doing and copy that until you find something that works, then tinker until you're happy with it
Same person as last reply, lol
Also I can run ~8B models on an ASUS TUF Gaming A14 laptop from 2020 with an RTX 2060 mobile GPU. If you have something better than that (with 8GB VRAM, preferably more) then you can do a bit more than I could
Also I can run ~8B models on an ASUS TUF Gaming A14 laptop from 2020 with an RTX 2060 mobile GPU. If you have something better than that (with 8GB VRAM, preferably more) then you can do a bit more than I could
Post a Reply