Hardware LLM Taalas Reaches >14,000 TPS on Llama 3.1 8B

Products | Taalas

Products Log Careers

Products Taalas HC1 Technology Demonstrator Runs Llama 3.1 8B model TSMC 6nm | 815mm2 | 53B Transistor 2.5 kW Server Try our chatbot Request API access

Instantaneous Inference HC1 demonstrates the power of Taalas hardcore model silicon technology, delivering 17k tokens per second per user on Llama 3.1 8B model.

Source: Model Llama 3.1 8B, Nvidia Baseline (H200), B200 measured by Taalas | Groq, Sambanova, Cerebras performance from Artificial Analysis | Taalas Performance run by Taalas labs | Input sequence length 1k/1k

CloseJoin our team! Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam placerat iaculis porta. Nam id blandit lectus. Vivamus at turpis eu dolor vulputate dignissim. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam placerat iaculis porta. Nam id blandit lectus. Vivamus at turpis eu dolor vulputate dignissim.

Send your CV [contact-form-7 id="c1a6c82" title="Contact form"]By submitting this form: You agree to the processing of the submitted personal data in accordance with our Privacy Policy, including the transfer of data to the United States.

Search Search for:

This website uses cookies to improve user experience. To learn more take a look at our Privacy policy. By selecting "Accept cookies" on this banner, you agree to the use and storage of cookies on your device.

Accept cookies

This website requires a JavaScript enabled browser.

You are using an outdated browser which can not show modern web content. We suggest you download Chrome or Firefox.

Hardware LLM Taalas Reaches >14,000 TPS on Llama 3.1 8B

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast