A Study into the Limitations of Cnn Recognition on Isolated Bengali Compound Characters

Main Article Content

Tasnim Zia
Ankur Datta
Mohammad Raghib Noor
M Ashraful Amin
Amin Ahsan Ali
A K M Mahbubur Rahman

Abstract

There are over 265 million Bangla native and non-native speakers, however, the
advancements in Bangla Optical Character Recognition is falling behind when compared
with other languages because of a broader set of complex characters, multiple handwriting
styles, and a lack of datasets. Convolutional Neural Network models have been highly
successful in detecting the handwritten alphabet scripts. However, we found that nowadays,
two staged detectors, such as CNN-RNN, Encoder-Decoders, Vision Transformers have been
doing much better than pure CNNs in pattern recognition and Bengali Compound Character
Recognition. In order to understand why it is so, we chose five commonly used pretrained
CNN models from Pytorch: VGG-16, ResNet-50, ResNet-101, Wide ResNet-50-2, and
ResNeXt-50-32x4d to classify the characters and compare their performances. Grad-CAM
and Grad-CAM++ were used to generate heatmaps to see the key areas that the models
focused on while classifying. We found pattern problems in Bangla compound characters
along with problematic perceptions in our finetuned CNNs that we have thus listed with
detailed analysis.

Article Details

Section
Articles