Download PDF


A graduate from National University of Singapore majored in Applied Mathematics.

She has more than 4 years experience in big data analysis and modelling.

She likes to wrangle with and make sense of data using her analytical skill. Her projects range from full-stack web development to data crawling, analytics, text processing and machine learning.

Analysis/Modelling: Regression Analysis, Linear Programming-Simplex Method.

Machine Learning: Decision Tree, Regression, GBM, XGBoost, GBTRegressor, Random Forest, kMeans, kNN.

Data science: Python, SQL, R, Spark, Hadoop.

Web: JavaScript, PHP, Python.

Version control: Git.


Work experience

Apr 2016Aug 2018

Data Scientist

Lazada Group - Alibaba
  • Maintained and automated existing lead time pipelines which helped provide estimation for supply chain and operations. Built a new fully automated prediction model that improved parcel arrival time prediction accuracy to more than 80% - won Breaking Boundaries Awards from Alibaba. Tech stack: Hive + R + PySpark + GBTRegression + kMeans.
  • Enhanced, extended, maintained, and fully automated Attribute Extraction. The project outperformed previous workflows and other departments' efforts in both accuracy (>95%) and extraction rate (>13 times higher). Tech stack: Regular expression + Python.
  • Built, deployed and maintained Seller Cancellation model to reduce more than 50% cancelled orders initiated by sellers. Tech stack: Hive, Python, Random Forest, XGBoost, Airflow.
  • Automated BI process using Attribute Filled Report generation. Tech stack: SQL, PySpark.
  • Automated data uploading process via integrating with Seller Center API. Tech stack: Python.
Sep 2014Dec 2015


  • Mainly responsible for FX rates analyzing, modeling, and pricing strategy.
  • Built an optimized mathematical model based on Linear Programming - Simplex Method and applied it together with live rates crawled by Node.js and deployed to Heroku to create a multi-currency pricing strategy application. (White Paper)
  • Built a daily rate crawler using Python, made web-based visualization using Chart.js and performed statistical analysis on the collected data stored on MySQL.
  • Re-designed company website from non-responsive to responsive mobile-friendly and boosted website ranking on Google using basic SEO techniques.
Jan 2014Mar 2014


  • Data analysis, part of energy research project; created and designed marketing materials.
  • Mobile Apps Project management, worked directly with director and developers, mainly responsible for the project acquisition at the first stage.
  • Used MS Office-VBA to organize, design templates, and set up automation to enhance management efficiency (saved >70% time). 
  • Support on managerial tasks for the Managing director.
May 2013Jul 2013


  • Transformed the company's Booking and Project Management processes from manual to automatic by building user-friendly systems using Excel-VBA, which prevented 100% revenue lost caused by double bookings and reduced more than 80% time and human resources. - Testimonial
  • Reorganized, updated and managed data in NetSuite and other systems.
  • Created new workflow and programmed for future mass-cleaning processes (reduced time cost from several weeks to 2 hours).


  • Programming Languages: Python, SQL, R programming, JavaScript, VBA, Matlab, Maple
  • Math & Statistics: Data Analysis, Math Modeling, Linear Programing (Simplex Method), FX
  • Machine Learning: scikit-learn, Spark MLlib + ML, Naive Bayes, SVM, Decision Tree (RFR, GBM, XGBoost), Regression (GBM, XGBoost, LinearRegression)
  • Data:  Python, Hive, SQL, PySpark, R programming, Hadoop & MapReduce
  • Visualisation: Python-ggplot, seaborn (matplotlib), plotly, R-ggplot2
  • Web development: Node.js, JavaScript, HTML & CSS, PHP
  • Libraries & frameworks: jQuery, Bootstrap, Chart.js, D3.js
  • Testing: Selenium, PhantomJS
  • Development tools: Linux, Git, Sublime, VirtualBox
  • Languages: Vietnamese (native), English (proficient), Chinese (intermediate)


Deep Neural Network for Image Classification: cat vs non-cat

Web Server Log Analysis with Apache Spark

Analysing Udacity forum logs

Building an algorithm to identify Enron Employees who may have committed fraud based on the public Enron financial and email dataset

Using R, wrangling and analysing many different data sets (up to 99003 observations of 18 variables) across a variety of topics: Facebook users (generated from a complex model), Reddit users, HIV Ratio (Gapminder), Diamonds Price and Red Wine Quality.

FX Rates Crawler, Social Network Data Structure.

Front-end: HTML & CSS, JavaScript & jQuery

Back-end: Node.js, PHP

Projects: Interactive Resume; Web-based Portfolio; Mini-apps (inspired by; FX Rates Analysis (PHP); Online foreign exchange rates feed (Node.js)

Co-curricular activities

Amateur Writer for Vietnamese Newspapers

  • Wrote stories, compositions, movies/books reviews in Vietnamese newspapers publishers (Tuoi Tre, Thanh Nien News - daily circulation > 450,000).