Old dual xeon servers for llm inference?
On ebay there are a lot of those used servers with two decent xeons and like 128gb 8 channel ddr4 ram for cheap (at least here in the UK). How much tokens per second could u expect with NUMA and llama.cpp on these systems?