Papers :
First Bengali Question Answering System trained on synthetic translated dataset BanglaSQuAD using multilingual BERT models. Benchmark dataset released in this link.
Text mining and network analysis applied on protest related personal story dataset.
Technical Reports :
Bangla machine translation System trained on SUPARA Benchmark Bangla-English parallel corpus with LSTM and transformer models.CSE 495, NLP course project. Dobhashi means translator in Bangla.
Survey paper on trends in NLP for low resource languages featuring transfer learning and translation attempts.
Historical data from Yahoo Finance was combined with hackernews article related dataset to analyze trends in stock market closing prices. Final models were XGBoost and LSTM based.
Projects :
Tensorflow hub NLP project focusing on classifiying BARD bangla dataset into 5 classes. Uses pretrained embedding exporter to export FastText bangla embeddings to TF-Hub module exporter. The exported embedding module is used to classify bangla articles. Achieves 94% accuracy and precision in a heavily imbalanced dataset.
Mediaviz uses force atlas 2 layout as default and scales the layout automatically for graphs with 100-1000 nodes that has a power law linking structure. Features for network filtering, coloring, node resizing, prevention of label overlap and community visualization are also added. Python package deployed on pip. Project link & Github repository. Blog post here and here.
- Credit Card Recommendation System
Credit card recommendation system built with scikit-learn and deployed with Django and Google Dialogflow. Recommends within 120 cards collected from 40+ banks in Bangladesh based on similarity measures compared with user preference input.
- Transfer Learning on Multi-Class Fish Image Classification Contest
Transfer Learning with VGG16 neural network architecture implemented by Keras on multi-class fish classification problem with data from Nature Conservancy Fishery Monitoring Competition on Kaggle. Udacity capstone project using deep learning.
- Network Visualization of Media coverage of violence against women in Bangladesh
This project explore the media coverage on the articles about harassment or violence against women. Using named entity extraction data is first filtered using keywords with python and networkx. Network visualization is done using gephi. Interactive link.
Take-Home Challenges :
- Classifying Actionable Sentences from Emails
Extracted text from Enron dataset to get sentences to build a rule based model with keyword and textacy features. Also based on company given small positive-only labelled dataset merged with output from rule based models, build the actual actionable sentence classifier with NB,SVM,LSTM and Logistic regression models.