Case study

AI that works despite limitations

Development of Softmax operator for embedded environment.

Industry:
Research and Development (R&D)
Customer Size:
Global technology company developing language models
Team:
5 persons
The role of our specialist:
Principal Engineer responsible for development and implementation of Softmax operator
Start of cooperation:
August 2023 (still in progress)
Cooperation model:
Tempo e Materiale
Technology:
C++
Python
QT
Deep learning
Node.js
JavaScript
  • Client: global technology company (R&D, AI, embedded)
  • Problem: low precision of the INT8 Softmax operator on resource-constrained hardware
  • Solution: new operator implementation to increase accuracy without exceeding hardware limits
  • Effekt: component integrated into the architecture and used in further development of LLM models
  • The role of Team Connect: Principal Engineer who independently identified the problem and developed the solution

Context

Our Principal Engineer joined an international team developing an AI model build tool. The project was implemented within the R&D center of a global technology company, and its goal was to enable the launch of language models (LLM) on devices with limited computing resources.

The team worked in a model of high autonomy — without a rigidly assigned backlog, which gave room for their own analyses and technical input.

problem

Already at the beginning of the cooperation, the specialist noticed a potential area of risk — the operation of the Softmax operator, responsible for converting the results of the AI model into a probability distribution.

In the past, Softmax was mainly used at the end of classification models (e.g. image recognition). Today it also plays a key role in modern language models — among others. in transformer architectures, where it is responsible for analyzing the relationships between words.

On devices with limited computing power, the operator had to operate in a quantized version, that is, numerically simplified. This form of calculation was associated with a loss of precision, which affected the quality of the results generated by the models.

Analysis and diagnosis

Instead of waiting for the assigned task, the Principal Engineer himself analyzed how Softmax works in the quantized version. He identified under what conditions the inaccuracy occurs and how it affects the final results of the models, especially for embedded devices.

Solution

Based on his own observations and tests, he developed an alternative version of the action of the Softmax component, which:

  • increased the precision of the results in a quantized environment,
  • did not exceed the available hardware resources,
  • could be used in existing architecture without the need for remodeling.

For a long time there was no certainty that the component would be used - but the specialist continued to develop the solution, considering it potentially relevant. Finally, it was integrated into the main stream of the project development.

Effects

Streamlined Softmax has become a real support in the development of AI models on resource-limited devices.

Thanks to the proactivity and technical inquisitiveness of the specialist, the project gained not only a ready-made solution, but also a direction of development based on anticipating challenges, and not just extinguishing them.

Technical background

  • Softmax is an operator used in language models (LLM) among others. to transform neural network results into a probability distribution.
  • On low-power embedded devices, the models operate in INT8 quantization mode, i.e. they operate on integers in the range of, for example, -127 to 128, instead of floating point numbers.
  • The result of quantization is the loss of precision — important especially in the calculations that determine the context and decisions of the model.
  • Our specialist proposed and implemented a new implementation of the Softmax operator, taking into account optimization in terms of: accuracy (improvement of final values), operation in a limited environment, the possibility of further integration into the compiler architecture.
  • The technologies used in the project included among others. C++, Python, Qt, Node.js and areas related to Deep Learning and neural network processing.

The effects of cooperation at a glance

  • Improved accuracy of calculations - Softmax works more accurately in a quantized environment, without increasing the load.
  • AI Models on Limited Hardware - streamlining made it possible to launch LLMs where previously it was difficult or impossible.
  • Ready solution ahead of time - the component was ready before the need for its use became obvious.
  • Integration with the architecture of the project - implemented code has become part of the main stream of development.
  • Self-identification and implementation of the task - the specialist himself recognized the problem and provided a solution.

Technology is not just code - it's decisions that influence the direction of development.

Our engineers not only program, but actually shape solutions based on AI and Machine Learning.

See how we support technology teams or contact us if you are looking for a technology partner.