Attention mechanisms are used as part of deep learning to tell an algorithm to focus more on certain aspects of an algorithm than others. Attention mechanisms tell the machine learning models to use more processing power on the parts of the inputted data that have been ascribed higher priority. Attention is often used as part of natural language processing and computer vision, and can be used as part of recurrent neural networks. The most common types of attention mechanisms rely on dot-product attention or multi-head attention.

Dot-product attention looks at the dot product between vectors to determine what needs attention. Dot product attention also considers the hidden states of the encoder and decoder to help determine the alignment score function. Multi-head attention, on the other hand, is the combination of multiple attention methods (possibly including dot-product) to direct the attention of the machine learning algorithm. Rather than relying on a single method of guiding attention, a multi-head approach utilizes many methods to ensure the network is appropriately directed.

Attention mechanics are often used as part of natural language processing in order to speed up processes and ensure that computational time is spent effectively. When processing language, the attention mechanics will often be in the connected neural networks that connects to the decoder. As the input sentence starts to be processed, a number of time steps occur. Oftentimes, the attention unit will be utilized immediately after the input. As part of this step, attention weights will be assigned to the encoders, and then a weighted sum will be sent along to the decoder. Generally, input words in a sentence will be assigned weights, so the first word will have the most attention, then the second, and so on. This is especially important as part of machine translations and fully connected recurrent neural network architecture. Translation services often utilize a sequence to sequence approach, where the output of one step becomes the input context for the next, and decisions regarding output are made based on the weights assigned to the state of the encoder and alignment scores. Attention in a sequence to sequence approach will help ensure that output words follow in an appropriate order.

Attention mechanics are helpful in regard to machine learning as they improve:

  • Translations between languages: As described above, attention mechanics have been shown to significantly improve automated translations between languages.
  • Translating speech to text: Similar to translations between languages, attention mechanics help improve the translation of speech to text in a similar manner. By focusing attention, they can help increase both the speed and the accuracy of speech-to-text software.
  • Computer vision: Attention mechanics can help computer vision determine what areas of an image are especially important for processing. For example, if a question is asked regarding the color of the car in the center of an image, the machine learning algorithm can focus on just the center part of the image to make a determination.
  • Anomaly detection: By giving weights to the types of anomalies that are most important, attention mechanisms can improve detection, such as that around fraud. If a company is attempting to identify fraudulent credit card transactions, they may choose to weigh unexpected locations higher than large purchases, for example.