Face Emotion Recognition using Intel oneAPI and Keras [AI for Autism]

6 min readMar 4, 2024

Autism Spectrum Disorder (ASD) is a developmental disorder characterized by difficulties in social interaction, communication, and repetitive behaviors. One of the challenges faced by individuals with ASD is understanding and interpreting facial expressions, which are important cues for recognizing emotions. Recent advancements in Artificial Intelligence (AI) have led to the development of facial emotion recognition technology, which has the potential to support individuals with ASD in improving their social skills and emotional understanding.

Understanding Face Emotion Recognition

Facial emotion recognition involves the use of AI algorithms to analyze facial expressions and identify the emotions expressed. These algorithms can detect subtle changes in facial features, such as the eyes, mouth, and eyebrows, to determine the underlying emotions. For individuals with ASD, who may have difficulty interpreting facial expressions, this technology can provide valuable insights into the emotions of others.

How Does FER Work?

FER systems typically operate in three main stages:

Face detection: The system locates and isolates faces within an image or video frame.
Feature extraction: Key facial features such as eyebrows, eyes, mouth, and their relative positions are identified and extracted.
Emotion classification: Using a trained AI model, the extracted features are analyzed to predict the most likely emotion being expressed.

It’s crucial to remember that FER technology is still under development, and its accuracy can be influenced by various factors, including lighting, angle, and individual differences in facial expressions. Additionally, emotions are complex and multifaceted, often involving a combination of various internal states.

Benefits of Face Emotion Recognition for Individuals with ASD

Improved Social Skills: By accurately recognizing facial expressions, AI can help individuals with ASD better understand the emotions of others, leading to improved social interactions and relationships.
Emotion Regulation: AI can also help individuals with ASD learn to regulate their own emotions by providing feedback on their facial expressions. This can help them develop strategies for expressing their emotions in a more socially appropriate manner.
Personalized Support: By analyzing patterns in facial expressions, AI can help identify individual differences in emotional processing among individuals with ASD. This information can be used to tailor interventions and support strategies to meet the specific needs of each individual.
Early Intervention: Facial emotion recognition technology can be used as a tool for early intervention in children with ASD, helping them develop emotional intelligence skills at a young age.

Ethical Considerations and the Road Ahead

The development and application of FER technology raise several ethical concerns that need careful consideration:

Privacy and Data Security: The collection and use of facial data raise concerns about individual privacy and potential biases within the data used to train AI models.
Accuracy and Misinterpretation: As mentioned earlier, FER is not foolproof, and misinterpretations of emotions could lead to misunderstandings and further social difficulties.
Over-reliance on Technology: It’s crucial to emphasize that FER should be used as a supportive tool, not a replacement for developing genuine social understanding and empathy.

Intel oneAPI

Intel oneAPI is a unified programming model that simplifies the development of applications for heterogeneous architectures. It provides a single set of tools and libraries that developers can use to target a wide range of hardware platforms. Intel has been actively involved in optimizing popular machine learning frameworks like TensorFlow and XGBoost to leverage the performance benefits of Intel architecture, including CPUs and accelerators like Intel Xeon processors and Intel FPGAs. These optimizations aim to improve the speed and efficiency of machine learning workloads, enabling developers to train and deploy models faster and more cost-effectively.

In this blog we have used the Intel oneAPI to show how efficiently we can train and infer models using it.

Dataset

The Young AffectNet HQ dataset serves as a valuable resource for researchers and developers working on facial emotion recognition (FER) technology, particularly when focusing on younger age groups.

Building upon the foundation of the AffectNet-HQ dataset, Young AffectNet HQ addresses a specific limitation: the lack of data specifically representing the faces of children and adolescents. This is crucial, as facial expressions and the ways emotions manifest can differ significantly between younger and older individuals.

train_ds = keras.utils.image_dataset_from_directory(
            directory="/young-affectnet-hq/",
            labels='inferred',
            label_mode='categorical',
            batch_size=32,
            image_size=(224, 224))

Model

This blog uses the DAN model for feature extraction and xgboost for classification.

Distract your Attention Network (DAN) is based on two key observations in biological visual perception. Firstly, multiple facial expression classes share inherently similar underlying facial appearance, and their differences could be subtle. Secondly, facial expressions simultaneously exhibit themselves through multiple facial regions, and for recognition, a holistic approach by encoding high-order interactions among local features is required. To address these issues, DAN has three key components: Feature Clustering Network (FCN), Multi-head Attention Network (MAN), and Attention Fusion Network (AFN). Specifically, FCN extracts robust features by adopting a large-margin learning objective to maximize class separability. In addition, MAN instantiates a number of attention heads to simultaneously attend to multiple facial areas and build attention maps on these regions. Further, AFN distracts these attentions to multiple locations before fusing the feature maps to a comprehensive one.

class ChannelAttn(Layer):
    def __init__(self, c=512) -> None:
        super(ChannelAttn,self).__init__()
        self.gap = nn.AveragePooling2D(7)
        self.attention = Sequential([
            nn.Dense(32),
            nn.BatchNormalization(),
            nn.ReLU(),
            nn.Dense(c,activation='sigmoid')]
        )

    def call(self, x):

        x = self.gap(x)
        x = nn.Flatten()(x)
        y = self.attention(x)
        return x * y


class SpatialAttn(Layer):
    def __init__(self, c=512):
        super(SpatialAttn,self).__init__()
        self.conv1x1 = Sequential([
            nn.Conv2D(256, 1),
            nn.BatchNormalization()]
        )
        self.conv_3x3 = Sequential([
            nn.ZeroPadding2D(padding=(1, 1)),
            nn.Conv2D(512, 3,1),
            nn.BatchNormalization()]
        )
        self.conv_1x3 = Sequential([
            nn.ZeroPadding2D(padding=(0, 1)),
            nn.Conv2D(512, (1,3)),
            nn.BatchNormalization()]
        )
        self.conv_3x1 = Sequential([
            nn.ZeroPadding2D(padding=(1, 0)),
            nn.Conv2D(512,(3,1)),
            nn.BatchNormalization()]
        )
        self.norm = nn.ReLU()

    def call(self, x) :
        y = self.conv1x1(x)
        y = self.norm(self.conv_3x3(y) + self.conv_1x3(y) + self.conv_3x1(y))
        y = tf.math.reduce_sum(y,axis=1, keepdims=True)
        return x*y


class CrossAttnHead(Layer):
    def __init__(self, c=512):
        super(CrossAttnHead,self).__init__()
        self.sa = SpatialAttn(c)
        self.ca = ChannelAttn(c)

    def call(self, x):
        return self.ca(self.sa(x))


@keras.saving.register_keras_serializable(package='custom')
class DAN(Model):
    def __init__(self, num_classes=8,trainable=True,dtype='float32'):
        super(DAN,self).__init__()
        self.mod = keras.applications.ResNet50(
            include_top=False,
            weights="imagenet",
            input_shape=(224,224,3)
        )
        self.mod.trainable= False
        self.num_head = 4
        self.hd = CrossAttnHead()
        self.hd=[]
        for i in range(self.num_head):
          self.hd.append(CrossAttnHead())
        self.features = nn.Conv2D(512, 1,padding='same')
        self.fc = nn.Dense(num_classes)
        self.bn = nn.BatchNormalization()

    def call(self, x) :
        x = self.mod(x)
        x=self.features(x)
        heads = []
        for h in self.hd:
            heads.append(h(x))

        heads = tf.transpose(tf.stack(heads),perm=(1,0,2))
        heads = keras.ops.log_softmax(heads, axis=1)
        return tf.math.reduce_sum(heads,axis=1)

backbone = DAN()
model = Sequential([
    backbone,
    nn.Dense(8,activation='softmax')
])

model.compile(optimizer='adam',loss=keras.losses.CategoricalCrossentropy(),metrics = ['accuracy'])
model.fit(train_ds,epochs=20)

After fine-tuning the DAN model, we use the trained backbone for feature extraction.

tx = backbone.predict(train_ds)
import sklearn
from sklearnex import patch_sklearn, unpatch_sklearn
patch_sklearn()
from pandas import MultiIndex, Int16Dtype # if you don't import in this order you will get a pandas.Int64Index fix for FutureWarning error.
import xgboost as xgb
from time import perf_counter

xgb_params = {
    'objective':                    'binary:logistic',
    'predictor':                    'gpu_predictor',
    'disable_default_eval_metric':  'true',
}

# Train the model
t1_start = perf_counter()  # Time fit function
model_xgb= xgb.XGBClassifier(**xgb_params)
model_xgb.fit(tx,y)
t1_stop = perf_counter()
print ("It took", t1_stop-t1_start," to fit.")

t1_start = perf_counter()  # Time fit function
model_xgb.predict(tx[:10000])
t1_stop = perf_counter()
print ("It took", t1_stop-t1_start," to fit.")

model_xgb.save_model('xgb.json')

Then we train and infer without using the intel optimisation.

unpatch_sklearn()

xgb_params = {
    'objective':                    'binary:logistic',
    'predictor':                    'cpu_predictor',
    'disable_default_eval_metric':  1,
}

t1_start = perf_counter()  # Time fit function
model_xgb= xgb.XGBClassifier(**xgb_params)
model_xgb.fit(tx[:10000],train_y[:10000])
t1_stop = perf_counter()
print ("It took", t1_stop-t1_start," to fit.")

t1_start = perf_counter()  # Time fit function
model_xgb.predict(tx[:10000])
t1_stop = perf_counter()
print ("It took", t1_stop-t1_start," to fit.")

Demo

We also have the huggingface demo. Click here

Challenges and Future Directions

While facial emotion recognition technology shows promise for supporting individuals with ASD, several challenges remain. These include the need for more diverse and representative datasets, as well as the development of algorithms that can accurately detect subtle changes in facial expressions. Future research in this area will focus on addressing these challenges and further enhancing the effectiveness of AI-based approaches for supporting individuals with ASD.

Conclusion

Facial emotion recognition technology has the potential to significantly impact the lives of individuals with ASD by improving their social skills, emotional understanding, and overall quality of life. By leveraging the power of AI, we can provide individuals with ASD with the tools they need to better navigate the social world around them.