Download PDF

Work experience

Data Scientist

2020/122021/3
NSU HCI DIAL Lab

Project : Alor Akash Research on intersection of gender, technology usage and financial inclusion among Bangladeshi women. Facilitated by NSU ECE dept, funded by Bill and Melinda Gates Foundation. Worked on text visualization and analytics of interview transcript data of digital financial inclusion.

Google Summer of Code Participant

2019/42019/8
Tensorflow, Google

Worked with Tensorflow-Hub team on text embedding modules. Features prototype ULMFiT implementation, pretrained embedding exporter and a Bangla text classification notebook.

Data Scientist

2017/62017/9
Cramstack

Worked on government project for cleaning 5 years worth of electricity data(time series) with pandas library and python programming language to build an interactive dashboard prototype.

Education

B.Sc in Software Engineering

20132014
University of Waterloo, Canada

B.Sc in Computer Science

20152020
North South University, Bangladesh

Relevant Coursework : Data Structures, Computer Architecture, Discrete Math, Digital Logic, Calculus, Linear Algebra, Probability and Stats, Machine Learning, Artificial Intelligence, Natural Language Processing. 

Portfolio

Papers : 

First Bengali Question Answering System trained on synthetic translated dataset BanglaSQuAD using multilingual BERT models. Benchmark dataset released in this link.

Text mining and network analysis applied on protest related personal story dataset. 

Technical Reports : 

Bangla machine translation System trained on SUPARA Benchmark Bangla-English parallel corpus with LSTM and transformer models.CSE 495, NLP course project. Dobhashi means translator in Bangla.

Survey paper on trends in NLP for low resource languages featuring transfer learning and translation attempts.

Historical data from Yahoo Finance was combined with hackernews article related dataset to analyze trends in stock market closing prices. Final models were XGBoost and LSTM based.

Projects : 

Tensorflow hub NLP project focusing on classifiying BARD bangla dataset into 5 classes. Uses pretrained embedding exporter to export FastText bangla embeddings to TF-Hub module exporter. The exported embedding module is used to classify bangla articles. Achieves 94% accuracy and precision in a heavily imbalanced dataset. 

Mediaviz uses force atlas 2 layout as default and scales the layout automatically for graphs with 100-1000 nodes that has a power law linking structure. Features for network filtering, coloring, node resizing, prevention of label overlap and community visualization are also added. Python package deployed on pip. Project link & Github repository. Blog post here and here

Take-Home Challenges :

  • Classifying Actionable Sentences from Emails
    Extracted text from Enron dataset to get sentences to build a rule based model with keyword and textacy features. Also based on company given small positive-only labelled dataset merged with output from rule based models, build the actual actionable sentence classifier with NB,SVM,LSTM and Logistic regression models.

Scholarships And Certification

Media

Created withVisualCV