Career Overview
My career bridges academic research, professional experience, and community contributions in AI and machine learning. For my PhD at WPI, I focus on multimodal AI systems, integrating wearable sensors, signal processing, and large language models (LLMs) to address real-world challenges. My prior work as a Machine Learning Engineer involved developing NLP-driven conversational AI for Bengali speakers and contributing to computer vision projects.
​
I’ve also conducted research in audio, vision, and biomedical signal processing during my undergraduate studies and actively participated in technical competitions and IEEE volunteering initiatives. For a detailed summary, please refer to my CV. You can also explore my projects in the attached document and find my publications in the Publications section or on Google Scholar.
Work Experience
July 2023 - Current
Research and Teaching Assistant
Worcester Polytechnic Institute
-
Served as a TA for the following courses:
-
Graduate: CS 525 -DS 595 - ECE 579 ​On-Device Deep Learning
-
Undergraduate: ECE 2312 Discrete-Time Signal And System Analysis, ECE 2029 Introduction To Digital Circuit Design, ECE 2019 Sensors, Circuits, And Systems
-
-
Worked as a Summer RA for the Bringing Awareness through Systems for Humans (BASH) Lab​ on the following projects:
-
LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors
-
Uncertainty-Minimizing Early Exit for On-Device Deep Learning
-
February 2021 - July 2023
Machine Learning Engineer
Hishab Technologies Ltd.
-
Worked with Python in the NLP domain (Automatic Speech Recognition (ASR), Conversational AI, NER, text classification, Grapheme-to-Phoneme (G2P), LM, TTS, etc.), MLOps, DevOps, etc. to build telephony conversational AI services for Bengali people.
-
Utilized GCP, AWS, Jira, Bitbucket, Confluence, SonarQube, CML, pdoc3, Kafka, Jenkins, etc. for organizing.
-
Worked with Hishab's partner Chowa Giken on a Computer Vision project for 1 month.
September 2020 - December 2020
Machine Learning Engineer (Intern)
Socian Ltd.
-
Improved NLP and Speech Processing skills during a 4 months long internship program.
-
Tools: NLTK, CRF, FlairNLP, Sequitur, Phonetisaurus, FastText, CMUsphinx, Kaldi, KenLM
-
Tasks: POS tagger, NER, Sentiment analysis, Topic classification, G2P, ASR
April 2018 - March 2019
CTO
Gyanjam Ltd.
-
Developed and maintained an e-Commerce & e-Learning site
-
Instructed undergraduate students of BUET in a C programming course
-
Designed PCBs
PUBLICATIONS
DOANet: a deep dilated convolutional neural network approach for search and rescue with drone-embedded sound source localization
EURASIP Journal on Audio, Speech, and Music Processing 2020 (1), 1-18
​
Res-SE-ConvNet: A Deep Neural Network for Hypoxemia Severity Prediction for Hospital In-patients Using Photoplethysmograph Signal
IEEE Journal of Translational Engineering in Health & Medicine (Volume: 10) – 2022
​
Source and Camera Independent Ophthalmic Disease Recognition from Fundus Image Using Neural Network
2019 IEEE International Conference on Signal Processing, Information, Communication & Systems
Direction of Arrival Estimation through Noise Suppression: A Novel Approach using GSC Beamforming and Room Acoustic Simulation
2019 IEEE International Conference on Signal Processing, Information, Communication & Systems
Complete Automation of an E-commerce System with Internet of Things
2019 IEEE International Conference on Robotics, Automation, Artificial-intelligence and Internet-of-Things
COVID-19 mRNA Vaccine Degradation Prediction using Regularized LSTM Model
2020 IEEE international Women in Engineering Conference on Electrical and Computer Engineering
Detection of Tuberculosis from Chest X-Ray Images Based on Modified Inception Deep Neural Network Model
2020 IEEE international Women in Engineering Conference on Electrical and Computer Engineering
Achievements
2nd at IEEE VIP Cup 2019 - IEEE ICIP 2019, Taipei
Challenge : Activity Recognition from Body Cameras
Presentations : Video (Our segment: 1hr04min-1hr24min)
Judges’ analysis : Link
4th at IEEE SP Cup 2020 – IEEE ICASSP 2020
5th at IEEE VIP Cup 2020 – IEEE ICIP 2020
EDUCATION

Worcester Polytechnic Institute
Ph.D. (August 2023 - Present)
Department: Electrical and Compute Engineering
Courses: Deep Learning (Fabricio Murai), Machine Learning for Eng Applications (Ziming Zhang), Digital Image Processing (Ziming Zhang), Natural Language Processing (Xiaozhong Liu)
Research Projects:
-
Sensor and Vision Aware Large Language Model Driven Multimodal Agent
-
Uncertainty-minimizing Early Exit for On-Device Deep Learning
Graduate Advisor: Dr. Bashima Islam
CGPA: 4.00/4.00
Bangladesh University of Engineering and Technology
B.Sc. in Engineering (February 2016 - February 2021)
Department: Electrical and Electronic Engineering
Major: Signal Processing and Communication
Thesis: Audiovisual emotion recognition using multi-head attention based neural network with spectral and facial features
Thesis Supervisor: Professor Dr. Celia Shahnaz
CGPA: 3.40/4.00

PROJECTS
For details, please check the downloadable document.
Summaries are presented below.
Search & Rescue with Drone-Embedded Sound Source Localization (2018/11-2019/3)
[IEEE SP Cup 2019]
Our task was to detect azimuth and elevation angles of target sound sources from a drone. We solved it in two methods (published in a conference and a journal respectively). For both of them, we used pyroomacoustics to generate additional synthetic data with TIMIT dataset. The second method, our DOANet model, was a one-dimensional dilated CNN.
Activity Recognition from Body Cameras (2019/4-2019/9) [IEEE VIP Cup 2019]
Our main task was to predict office activity class from videos. Based on the confusion matrix, we divided the classes into two categories to address the inter-class similarity and intra-class variation. Then, a model was trained to determine the category. For the first category, a 10-class MLP was trained, and for the second category, SVM was used. For the final round, we directly used an optimized 19-class MLP.
Our second task was privacy protection. Washroom scenes were detected with a binary MLP classifier for scene-wide blurring. We used template matching for monitor screen blurring and used multithreading to optimize the speed. COCO dataset was utilized for blurring bodies, keyboards, and screens. We also blurred the faces.
Real-time distortion classification in laparoscopic videos (2020/6) [IEEE ICIP 2019 Challenge]
Different features were extracted for the 5 distortion classes (defocus blur, uneven illumination, motion blur, etc.), using OpenCV and numpy. The CNN based model was built with Keras.
Facial expression recognition using capsule neural network (2020)
I implemented a deep capsule neural network on Tensorflow and Keras for image classification with FER2013 dataset, with and without affine transformations using imgaug and dataset balancing using imblearn.
Unsupervised abnormality detection by using intelligent and heterogeneous autonomous systems (2019/11-2020/3) [IEEE SP Cup 2020]
We preprocessed multivariate drone sensor data from ROS. Trained on normal data, the method would explain anomalies for specific time frames for anomalous flights (abnormal accelerations, rotations, or orientations). For baseline, we modified a multivariate seq2seq semi-supervised anomaly detection method to make it real-time. Our own method used a model based on LSTM, attention layers, convolutional layers, and generative matching estimation.
Real-time vehicle detection and tracking using fisheye camera videos [IEEE VIP Cup 2020]
The challenge was the day or night lighting conditions, fisheye distortion, and different shapes of vehicles. We used a modified yolov4 EfficientNet B2 based model for this purpose, alongside image augmentations using imgaug.
​
Audiovisual emotion recognition using multi-head attention based network (2020)
I used librosa for mel spectrum feature extraction from audios and OpenFace based facial features for videos. I implemented a model based on the transformer layer of PyTorch.
Oxygen saturation level prediction from PPG (2020/10-2020/12)
We categorized lack of oxygen saturation in 3 severity levels. We treated the imbalance of the training set by oversampling with ADASYN and undersampling Tomek Links. A regularized CNN model with a suitable learning rate for loss convergence performed better than KNN or RF.
mRNA vaccine degradation prediction (2020)
Biologically inspired features such as base sequence, BPPs using EternaFold, loop type prediction, and structure prediction were used for this regression problem using a regularized LSTM model.
Voice Banking AI for Bank Asia, Bangladesh (2021), Due Reporting and Repayment Telephony Conversation AI for Hitachi, India (2021), and Product Ordering Telephony Conversational AI for Indian MSMEs (2022) [Hishab Ltd.]
We worked on TTS and a telephony ASR (Kaldi, Nemo), so that the Conversational AI (Rasa) we developed can communicate over telephones. We prepared NER models (FlairNLP, SpaCy, CRF) and language models (FastText, SpaCy) for the conversational AI.
Car Interior Image Improvement (2021/07-2021/08) [Chowa Giken Co.]
Input images contained glares, hazes, reflections, uneven illumination, etc. We used OpenCV based glare removal, image dehazing, and brightness update methods. For reflections and light flares, I trained ERRNet models, which produced satisfactory results and fulfilled the time constraint criteria for inference, removing dehazing requirements. For training, both real data and synthetic (CEILNet) data were used.
​
Deep Unlearning with Explainable AI and Saliency Maps​ (2023) [WPI]
I helped my team evaluate and visualize various deep unlearning techniques applied to the CIFAR-10 ​dataset. The prediction distributions show which classes the model thinks a test image from an unlearned class belongs to. Furthermore, with GradCam, HiResCam, and EigenCam, I've shown that while the principal components of the activations are non-zero for unlearned classes, gradients derived from class-guided backpropagation indicate a lack of saliency, verifying the claims of the authors of the Fast Unlearning paper. Furthermore, layer weight distances show that the distances are roughly proportional to that of a model that was never trained on the unlearned class.
​
Slides and technical report: Link
Code: Link
​
Zero-glance Single-pass Machine Unlearning for Text Classification​ (2023) [WPI]
I experimented with DBPedia text classification models to unlearn some classes with error-maximizing noise vectors to show that it's possible to get zero unlearned class accuracy and near-original retain class accuracies with just one epoch.
Slides, presentation video, and technical report: Link
Code: https://github.com/shouborno/nlp_fast_unlearning​
​
Large Language and Vision Assistant with Reasoning for Distinguishing Real-Fake Images (2024) [WPI]​
I prepared a dataset containing GPT-generated explanations of why a fake image generated with GAN, diffusion, IF, etc. methods and fine-tuned a LLaVA model with it to explore its capability for telling fake images apart.
Code: Link
Report: Link​
​
LLM as a Discriminator for Evaluating Machine-generated News Headlines (2024) [WPI]
I proposed an LLM-assisted evaluation method where a fine-tuned GPT model tries to differentiate between human and synthetically generated titles, considering the limitations of BLEU, METEOR, Rogue, etc.
Code: Link
Report: Link
​
LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors (2024) [WPI]
We prepared the SensorCaps and OpenSQA datasets and trained LLM models for sensor data interpretation and question-answering.
Code: https://github.com/BASHLab/LLaSA
arXiv: https://arxiv.org/abs/2406.14498
​
Volunteering Experience
July 2019 - March 2021
IEEE Signal Processing Society BUET SB Chapter — Chairperson
-
Arranged IEEE SPS Winter School 2019 on Multimodal Signal Processing (Report : IEEE vTools link )
-
We became the largest IEEE Student Branch Chapter of Bangladesh and received IEEE SPS Student Branch Chapter Growth Reward.
IEEEXtreme Programming Contest — Ambassador (2017-2019) and Country Lead (2019)
-
Arranged programming workshops
-
Participated in 2016, 2017 & 2018; became national champion in 2017 & 2018 (22nd worldwide in 2018)