Tom Fenton moves from local AI concepts to hands-on tools for matching LLMs to hardware, running local chatbots with Ollama and benchmarking AI performance.
OpenAI inference cost reduction cut ChatGPT guest traffic from tens of thousands of Nvidia GPUs to just a couple hundred, ...