Career Overview

My career bridges academic research, professional experience, and community contributions in AI and machine learning. For my PhD at WPI, I focus on multimodal AI systems, integrating wearable sensors, signal processing, and large language models (LLMs) to address real-world challenges. My prior work as a Machine Learning Engineer involved developing NLP-driven conversational AI for Bengali speakers and contributing to computer vision projects.

I’ve also conducted research in audio, vision, and biomedical signal processing during my undergraduate studies and actively participated in technical competitions and IEEE volunteering initiatives. For a detailed summary, please refer to my CV. You can also explore my projects in the attached document and find my publications in the Publications section or on Google Scholar.

My CV

My Projects

Connect on LinkedIn

Work Experience

July 2023 - Current

Research and Teaching Assistant
Worcester Polytechnic Institute

Served as a TA for the following courses:
- Graduate: CS 525 -DS 595 - ECE 579 On-Device Deep Learning
- Undergraduate: ECE 2312 Discrete-Time Signal And System Analysis, ECE 2029 Introduction To Digital Circuit Design, ECE 2019 Sensors, Circuits, And Systems
Worked as a Summer RA for the Bringing Awareness through Systems for Humans (BASH) Lab on the following projects:
- LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors
- Uncertainty-Minimizing Early Exit for On-Device Deep Learning

February 2021 - July 2023

Machine Learning Engineer
Hishab Technologies Ltd.

Worked with Python in the NLP domain (Automatic Speech Recognition (ASR), Conversational AI, NER, text classification, Grapheme-to-Phoneme (G2P), LM, TTS, etc.), MLOps, DevOps, etc. to build telephony conversational AI services for Bengali people.
Utilized GCP, AWS, Jira, Bitbucket, Confluence, SonarQube, CML, pdoc3, Kafka, Jenkins, etc. for organizing.
Worked with Hishab's partner Chowa Giken on a Computer Vision project for 1 month.

September 2020 - December 2020

Machine Learning Engineer (Intern)
Socian Ltd.

Improved NLP and Speech Processing skills during a 4 months long internship program.
Tools: NLTK, CRF, FlairNLP, Sequitur, Phonetisaurus, FastText, CMUsphinx, Kaldi, KenLM
Tasks: POS tagger, NER, Sentiment analysis, Topic classification, G2P, ASR

April 2018 - March 2019

CTO
Gyanjam Ltd.

Developed and maintained an e-Commerce & e-Learning site
Instructed undergraduate students of BUET in a C programming course
Designed PCBs

PUBLICATIONS

DOANet: a deep dilated convolutional neural network approach for search and rescue with drone-embedded sound source localization
EURASIP Journal on Audio, Speech, and Music Processing 2020 (1), 1-18

Res-SE-ConvNet: A Deep Neural Network for Hypoxemia Severity Prediction for Hospital In-patients Using Photoplethysmograph Signal
IEEE Journal of Translational Engineering in Health & Medicine (Volume: 10) – 2022

Source and Camera Independent Ophthalmic Disease Recognition from Fundus Image Using Neural Network
2019 IEEE International Conference on Signal Processing, Information, Communication & Systems

Direction of Arrival Estimation through Noise Suppression: A Novel Approach using GSC Beamforming and Room Acoustic Simulation
2019 IEEE International Conference on Signal Processing, Information, Communication & Systems

Complete Automation of an E-commerce System with Internet of Things

2019 IEEE International Conference on Robotics, Automation, Artificial-intelligence and Internet-of-Things

COVID-19 mRNA Vaccine Degradation Prediction using Regularized LSTM Model

2020 IEEE international Women in Engineering Conference on Electrical and Computer Engineering

Detection of Tuberculosis from Chest X-Ray Images Based on Modified Inception Deep Neural Network Model
2020 IEEE international Women in Engineering Conference on Electrical and Computer Engineering

Achievements

2nd at IEEE VIP Cup 2019 - IEEE ICIP 2019, Taipei

Challenge : Activity Recognition from Body Cameras

Presentations : Video (Our segment: 1hr04min-1hr24min)
Judges’ analysis : Link

4th at IEEE SP Cup 2020 – IEEE ICASSP 2020

Challenge : Unsupervised abnormality detection by using intelligent and heterogeneous autonomous systems

5th at IEEE VIP Cup 2020 – IEEE ICIP 2020

Challenge : Real-time vehicle detection and tracking at junction using a fisheye camera

EDUCATION

Worcester Polytechnic Institute

Ph.D. (August 2023 - Present)

Department: Electrical and Compute Engineering

Courses: Deep Learning (Fabricio Murai), Machine Learning for Eng Applications (Ziming Zhang), Digital Image Processing (Ziming Zhang), Natural Language Processing (Xiaozhong Liu)

Research Projects:

Sensor and Vision Aware Large Language Model Driven Multimodal Agent
Uncertainty-minimizing Early Exit for On-Device Deep Learning

Graduate Advisor: Dr. Bashima Islam

CGPA: 4.00/4.00

Transcript

Bangladesh University of Engineering and Technology

B.Sc. in Engineering (February 2016 - February 2021)

Department: Electrical and Electronic Engineering

Major: Signal Processing and Communication

Thesis: Audiovisual emotion recognition using multi-head attention based neural network with spectral and facial features

Thesis Supervisor: Professor Dr. Celia Shahnaz

CGPA: 3.40/4.00

Notre Dame College, Dhaka

Higher Secondary Certificate (2013 - 2015)

GPA: 5.00/5.00

Notable nationwide ranks in university entrance exams (2015): BUET 97th , Dhaka University 14th, KUET 1st, IUT 1st, SUST (Engineering, Biology, Architecture) 1st, BAU 1st

PROJECTS

For details, please check the downloadable document.

Project Documentation

Summaries are presented below.

Search & Rescue with Drone-Embedded Sound Source Localization (2018/11-2019/3)
[IEEE SP Cup 2019]
Our task was to detect azimuth and elevation angles of target sound sources from a drone. We solved it in two methods (published in a conference and a journal respectively). For both of them, we used pyroomacoustics to generate additional synthetic data with TIMIT dataset. The second method, our DOANet model, was a one-dimensional dilated CNN.

Activity Recognition from Body Cameras (2019/4-2019/9) [IEEE VIP Cup 2019]
Our main task was to predict office activity class from videos. Based on the confusion matrix, we divided the classes into two categories to address the inter-class similarity and intra-class variation. Then, a model was trained to determine the category. For the first category, a 10-class MLP was trained, and for the second category, SVM was used. For the final round, we directly used an optimized 19-class MLP.
Our second task was privacy protection. Washroom scenes were detected with a binary MLP classifier for scene-wide blurring. We used template matching for monitor screen blurring and used multithreading to optimize the speed. COCO dataset was utilized for blurring bodies, keyboards, and screens. We also blurred the faces.

Real-time distortion classification in laparoscopic videos (2020/6) [IEEE ICIP 2019 Challenge]
Different features were extracted for the 5 distortion classes (defocus blur, uneven illumination, motion blur, etc.), using OpenCV and numpy. The CNN based model was built with Keras.

Facial expression recognition using capsule neural network (2020)
I implemented a deep capsule neural network on Tensorflow and Keras for image classification with FER2013 dataset, with and without affine transformations using imgaug and dataset balancing using imblearn.

Unsupervised abnormality detection by using intelligent and heterogeneous autonomous systems (2019/11-2020/3) [IEEE SP Cup 2020]
We preprocessed multivariate drone sensor data from ROS. Trained on normal data, the method would explain anomalies for specific time frames for anomalous flights (abnormal accelerations, rotations, or orientations). For baseline, we modified a multivariate seq2seq semi-supervised anomaly detection method to make it real-time. Our own method used a model based on LSTM, attention layers, convolutional layers, and generative matching estimation.

Real-time vehicle detection and tracking using fisheye camera videos [IEEE VIP Cup 2020]
The challenge was the day or night lighting conditions, fisheye distortion, and different shapes of vehicles. We used a modified yolov4 EfficientNet B2 based model for this purpose, alongside image augmentations using imgaug.

Audiovisual emotion recognition using multi-head attention based network (2020)
I used librosa for mel spectrum feature extraction from audios and OpenFace based facial features for videos. I implemented a model based on the transformer layer of PyTorch.

Oxygen saturation level prediction from PPG (2020/10-2020/12)
We categorized lack of oxygen saturation in 3 severity levels. We treated the imbalance of the training set by oversampling with ADASYN and undersampling Tomek Links. A regularized CNN model with a suitable learning rate for loss convergence performed better than KNN or RF.

mRNA vaccine degradation prediction (2020)
Biologically inspired features such as base sequence, BPPs using EternaFold, loop type prediction, and structure prediction were used for this regression problem using a regularized LSTM model.

Voice Banking AI for Bank Asia, Bangladesh (2021), Due Reporting and Repayment Telephony Conversation AI for Hitachi, India (2021), and Product Ordering Telephony Conversational AI for Indian MSMEs (2022) [Hishab Ltd.]
We worked on TTS and a telephony ASR (Kaldi, Nemo), so that the Conversational AI (Rasa) we developed can communicate over telephones. We prepared NER models (FlairNLP, SpaCy, CRF) and language models (FastText, SpaCy) for the conversational AI.

Car Interior Image Improvement (2021/07-2021/08) [Chowa Giken Co.]
Input images contained glares, hazes, reflections, uneven illumination, etc. We used OpenCV based glare removal, image dehazing, and brightness update methods. For reflections and light flares, I trained ERRNet models, which produced satisfactory results and fulfilled the time constraint criteria for inference, removing dehazing requirements. For training, both real data and synthetic (CEILNet) data were used.

Deep Unlearning with Explainable AI and Saliency Maps (2023) [WPI]

I helped my team evaluate and visualize various deep unlearning techniques applied to the CIFAR-10 dataset. The prediction distributions show which classes the model thinks a test image from an unlearned class belongs to. Furthermore, with GradCam, HiResCam, and EigenCam, I've shown that while the principal components of the activations are non-zero for unlearned classes, gradients derived from class-guided backpropagation indicate a lack of saliency, verifying the claims of the authors of the Fast Unlearning paper. Furthermore, layer weight distances show that the distances are roughly proportional to that of a model that was never trained on the unlearned class.

Slides and technical report: Link

Code: Link

Zero-glance Single-pass Machine Unlearning for Text Classification (2023) [WPI]

I experimented with DBPedia text classification models to unlearn some classes with error-maximizing noise vectors to show that it's possible to get zero unlearned class accuracy and near-original retain class accuracies with just one epoch.

Slides, presentation video, and technical report: Link

Code: https://github.com/shouborno/nlp_fast_unlearning

Large Language and Vision Assistant with Reasoning for Distinguishing Real-Fake Images (2024) [WPI]

I prepared a dataset containing GPT-generated explanations of why a fake image generated with GAN, diffusion, IF, etc. methods and fine-tuned a LLaVA model with it to explore its capability for telling fake images apart.
Code: Link

Report: Link

LLM as a Discriminator for Evaluating Machine-generated News Headlines (2024) [WPI]

I proposed an LLM-assisted evaluation method where a fine-tuned GPT model tries to differentiate between human and synthetically generated titles, considering the limitations of BLEU, METEOR, Rogue, etc.
Code: Link

Report: Link

LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors (2024) [WPI]

We prepared the SensorCaps and OpenSQA datasets and trained LLM models for sensor data interpretation and question-answering.

Code: https://github.com/BASHLab/LLaSA

arXiv: https://arxiv.org/abs/2406.14498

Volunteering Experience

July 2019 - March 2021

IEEE Signal Processing Society BUET SB Chapter — Chairperson

Arranged IEEE SPS Winter School 2019 on Multimodal Signal Processing (Report : IEEE vTools link )
We became the largest IEEE Student Branch Chapter of Bangladesh and received IEEE SPS Student Branch Chapter Growth Reward.

IEEEXtreme Programming Contest — Ambassador (2017-2019) and Country Lead (2019)

Arranged programming workshops
Participated in 2016, 2017 & 2018; became national champion in 2017 & 2018 (22nd worldwide in 2018)

Graduate Records Examination (GRE)
October 23, 2021

331/340

Overall Score

170/170

Quantitative Reasoning

161/170

Verbal Reasoning

4.5/6.0

Analytical Writing

Language Proficiency
International English Language Testing System (IELTS)
Overall: 8.0/9.0 (February 5, 2021)

8.5/9.0

Listening

9.0/9.0

Reading

7.5/9.0

Speaking

7.0/9.0

Writing