Download PDF

Summary

A graduate from National University of Singapore majored in Applied Mathematics.

She has more than 5 years experience in big data analysis and modelling.

She likes to wrangle with and make sense of data using her analytical skill. Her projects range from full-stack web development to data crawling, analytics, text processing, Recsys and machine learning-deep learning.

Analysis/Modelling: Collaborative Filtering, Embedding, Matrix Factorisation, Regression Analysis, Linear Programming-Simplex Method.

Machine Learning: Decision Tree, Regression, GBM, XGBoost, GBTRegressor, Random Forest, kMeans, kNN.

Data science: Python, SQL, R, Hive, Spark, Hadoop.

Web: JavaScript, PHP, Python.

Version control: Git.

Work experience

20182018

Data Scientist

Metronom - Metro Group | Germany
  • Recsys (recommendation system) transformation from Naive Bayes classifier in R scripts to fully-automated, scalable, maintainable and customisable cloud-based application implementing matrix factorisation, embeddings and latent vectors exploration. Tech stack: GCP, Hive, TensorFlow, WALS, PySpark, Airflow, RESTful API.
  • Substitutability model enhancement from Yule's Q to product embeddings (Collaborative Filtering) and market basket analysis. Tech stack: GCP, Hive, Python, Airflow.
  • Customer 360 setup: business requirements, usage guidelines, pipeline creation and maintenance. Tech stack: GCP, Hive, Python, Airflow.
20162018

Data Scientist

Lazada Group - Alibaba | Singapore
  • Maintained and automated existing lead time pipelines which helped provide estimation for supply chain and operations. Built a new fully automated prediction model that improved parcel arrival time prediction accuracy to more than 80% - won Breaking Boundaries Awards from Alibaba. Tech stack: Alicloud, Hive, R, PySpark, GBTRegression, kMeans.
  • Enhanced, extended, maintained, and fully automated Attribute Extraction. The project outperformed previous workflows and other departments' efforts in both accuracy (>95%) and extraction rate (>13 times higher). Tech stack: Regular expression, Python.
  • Built, deployed and maintained Seller Cancellation model to reduce more than 50% cancelled orders initiated by sellers. Tech stack: AWS, Hive, Python, Random Forest, XGBoost, Airflow.
  • Automated BI process using Attribute Filled Report generation. Tech stack: SQL, PySpark.
  • Automated data uploading process via integrating with Seller Center API. Tech stack: Python.
20142015

Analyst

EZFX | Singapore
  • Mainly responsible for FX rates analyzing, modeling, and pricing strategy.
  • Built an optimized mathematical model based on Linear Programming - Simplex Method and applied it together with live rates crawled by Node.js and deployed to Heroku to create a multi-currency pricing strategy application. (White Paper)
  • Built a daily rate crawler using Python, made web-based visualization using Chart.js and performed statistical analysis on the collected data stored on MySQL.
  • Re-designed company website from non-responsive to responsive mobile-friendly and boosted website ranking on Google using basic SEO techniques.
20142014

Intern

Enerdata | Singapore
  • Data analysis, part of energy research project; created and designed marketing materials.
  • Mobile Apps Project management, worked directly with director and developers, mainly responsible for the project acquisition at the first stage.
  • Used MS Office-VBA to organize, design templates, and set up automation to enhance management efficiency (saved >70% time). 
  • Support on managerial tasks for the Managing director.
20132013

Intern

PropertyGuru
  • Transformed the company's Booking and Project Management processes from manual to automatic by building user-friendly systems using Excel-VBA, which prevented 100% revenue lost caused by double bookings and reduced more than 80% time and human resources. - Testimonial
  • Reorganized, updated and managed data in NetSuite and other systems.
  • Created new workflow and programmed for future mass-cleaning processes (reduced time cost from several weeks to 2 hours).

Education

Skills

  • Programming Languages: Python, SQL, R programming, JavaScript, VBA, Matlab, Maple
  • Math & Statistics: Data Analysis, Math Modeling, Linear Programing (Simplex Method)
  • Data Science-Machine Learning: scikit-learn, Spark MLlib + ML, Naive Bayes, SVM, Decision Tree (RFR, GBM, XGBoost), Regression (GBM, XGBoost, LinearRegression), Unsupervised Learning (k-NN), Recommendation System - Recsys (Collaborative Filtering, Embeddings,  Baysian Classifier, Matrix Factorisation)
  • Deep Learning: Tensorflow, Keras, PyTorch
  • Data:  Python, Hive, SQL, PySpark, R programming, Hadoop & MapReduce
  • Cloud Computing: Google Cloud Platform (GCP), Amazone Web Services (AWS), Alibaba Cloud (Aliyun)
  • Visualisation: Python-ggplot, seaborn (matplotlib), plotly, R-ggplot2
  • Web development: Node.js, JavaScript, HTML & CSS, PHP
  • Libraries & frameworks: jQuery, Bootstrap, Chart.js, D3.js
  • Testing: Selenium, PhantomJS
  • Development tools: Linux, Git, Sublime, VirtualBox
  • Deployment: Airflow, RESTful API, VMs, Docker
  • Languages: Vietnamese (native), English (proficient), Chinese (intermediate)

Self-education

Deep Neural Network for Image Classification: cat vs non-cat

Web Server Log Analysis with Apache Spark

Analysing Udacity forum logs

Building an algorithm to identify Enron Employees who may have committed fraud based on the public Enron financial and email dataset

Using R, wrangling and analysing many different data sets (up to 99003 observations of 18 variables) across a variety of topics: Facebook users (generated from a complex model), Reddit users, HIV Ratio (Gapminder), Diamonds Price and Red Wine Quality.

FX Rates Crawler, Social Network Data Structure.

  • Data Science - Machine Leaning

Projects: Rossmann Sales Forecast, Revenue Prediction, Gender Prediction, Click-through rate (CTR) Prediction

  • Web Development

Front-end: HTML & CSS, JavaScript & jQuery

Back-end: Node.js, PHP

Projects: Interactive Resume; Web-based Portfolio; Mini-apps (inspired by http://jenniferdewalt.com/); FX Rates Analysis (PHP); Online foreign exchange rates feed (Node.js)

Co-curricular activities

Amateur Writer for Vietnamese Newspapers

  • Wrote stories, compositions, movies/books reviews in Vietnamese newspapers publishers (Tuoi Tre, Thanh Nien News - daily circulation > 450,000).

References