V

LLM Inference Engineer

NEW
VeqtoRemote · US/EUSenior1h ago
$220k–$300ksalary band
US/EUtimezone
Seniorlevel
vLLMCUDARust
the role

Make our serving stack faster and cheaper — kernels, continuous batching, speculative decoding, end to end.

what you'll own
  • Optimize the serving path end to end
  • Implement continuous batching + speculative decoding
  • Own GPU utilization and cost per token
requirements
  • Deep CUDA or Triton experience
  • Comfortable in Rust or C++
  • Track record of measurable perf wins
about Veqto

Veqto serves open-weight models to developers. Profitable, ~30 people, infrastructure-obsessed.

more remote roles
TH

Retrieval & Search Engineer

NEW
Tensor HarborRemote · US/EU$185k–$245k1h ago
ElasticsearchpgvectorPython
S

Applied AI Engineer (LLM Apps)

NEW
SebbleRemote · Global$170k–$230k1h ago
OpenAILangChainTypeScript
L

MLOps / Inference Platform Engineer

NEW
LooplyticRemote · US/EU$190k–$260k1h ago
KubernetesTritonPython