Download PDF

Simon Macarthur

Experienced leader and data scientist working at the intersection of technology and business strategy to enable business decisions. Strategist, Technologist, Leader.


Having worked in Business Intelligence and Analytics for over 15 years, I am well versed in both the technical and business aspects of data. Recently completing a Masters of Information and Data Science (MIDS) from University of California, Berkeley, I am immersed in the latest bleeding edge data science techniques through the entire data pipeline. With a developer background, I am just as comfortable writing code and developing solutions as I am developing an analytics strategy and the associated technology stack to implement this, whilst keeping one eye on the latest developments in industry.

I combine the best of technical and business acumen, being able to bridge the gap between the latest data science techniques and the refined skill of business communication and integration. Of particular passion is the impasse of machine learning into the decision making process. 'Artificial Intelligence for the Enterprise' is the next wave of analytics, and I provide the skill set and insight into making this a reality - from strategy to implementation. I am well versed in insurance knowledge, but the techniques and solutions I have developed are industry agnostic and can be integrated as required. 

I have leadership skills that have been refined over the management of a team of 10+ developers, analysts and architects. I am comfortable with both the technical and softer skills associated with people management, and have completed high achieving leadership training, chosen by the Executive Team. 

Work History


Head of Business Intelligence

Avant Mutual Group Limited

My role is provide leadership and direction to Business Intelligence and Analytics at Avant. This includes leading a team of dedicated staff, design of the enterprise data warehouse, oversight of the ETL and data warehouse framework as well as guiding reporting and analytics. More recently, this entails developing a strategy to move Avant's data assets into a new data pipeline to enable data science techniques.


Business Intelligence Team Leader

Avant Mutual Group Limited

Information Architect

Avant Mutual Group Limited

Analyst / Programmer

Avant Mutual Group Limited

Analyst / Programmer

American Power Conversion



Masters of Information and Data Science (MIDS)

University of California , Berkeley

[email protected] features a multidisciplinary curriculum designed to prepare data science professionals to solve real-world problems using complex and unstructured data


Bachelor of Science, Computing Science with First Class Honours

University of Technology, Sydney (UTS)



Member Relationships - Graph Database

Avant Mutual Group

Avant has a growing membership base with complex relationships between customers, organisations, and a variety of attributes. Understanding and visualising these relationships is challenging. In this project, I performed data discovery to understand the relevant relationships and translated these into data requirements for loading into a MongoDB database. Following this, the relationships were loaded into Neo4J for a graph database, and visualised using Linkurious. 


Machine Learning - Customer Churn Prediction

Avant Mutual Group

Design and implementation of machine learning algorithms to predict the propensity of customer churn - implemented in Python using scikit logistic and Gradient Boosting Machine (GBM) algorithms. This included the operationalising of the output of this into the relevant business process. Stakeholders included Avant CEO, management team and relevant SME from sales.


Strategic Design and Implementation of New Data Pipeline

Avant Mutual Group

Avant's data pipeline was aging and unable to support a growing and evolving business. This project involved the strategic recommendation of a new data pipeline, design of the pipeline to ensure agility and support of the latest techniques as well as managing a variety of architectural decisions - from open source to vendor management. Technologies included Apache Kafka, Hadoop, Cassandra DB, Attunity Replicate and Control. 


Implementation of Agile Methodology

Avant Mutual Group

The demands on the business intelligence team within Avant are constant and ever-growing. A new method of engagement and delivery was required to facilitate a more agile approach to development and analytical work. As such, I spearheaded the implementation of an agile BI development process, as well as implementation of associated tools. This has increased the BI output many times over, and provides full exposure and clarity of how the work the team is doing fits into the corporate objectives. To further facilitate the visibility of this, I built a Python Flask website which read necessary data from the vendor APIs and provided updates on current work status and relevant analytics. This was rolled out organisationally.  


Hecate - Optimise Your Route!

ROUTE. RECOMMEND. REWARD It's the Hecate difference.

1. Advise us of your route, preferred departure days and times.
2. Hecate will immediately begin monitoring your route, collecting traffic and weather data.
3. Each week, receive a personalised recommendation for your route, advising of the best times to leave to
minimise your commute.

Login Details for the hecate site, to be fully immersed in the project:

Username: samacart

Password: password


Millennium Development Goals Visualisation

Global Health Observatory Millennium Development Goals. The data for the MDGs are spread out across the WHO’s website. Our goal is to create a consolidated set of visualisations to explore the MDGs. The visualisations will enable more interactive scenario exploration. A consolidated progress report has been released; however, there is not a way to explore all of the goals together.


Park or Bird: An XKCD Inspired Distributed Image Processing and Machine Learning Classifier using Spark

In computer science, it can be difficult to explain the difference between the easy the virtually impossible. We were inspired by an xkcd comic to take the “park or bird” challenge using Spark and MLlib. Our goal is to build a scalable system for image processing: ingesting raw images, converting images to machine learning features, training a classifier, and ultimately building a deployable scalable prediction engine based on Spark.

All code can be viewed at

Submitted paper can be viewed at View Paper


San Francisco Crime - A Kaggle Project

Participation in Kaggle Competition: Predict the category of crimes that occurred in the city by the bay.


A Review and Comparison of the Australian Privacy Principles

In March 2014, the new Australian Privacy Principles (APPs) came into effect. These replaced the now defunct National Privacy Principles and Information Privacy Principles and apply to all Australian organisations and the Australian Government agencies. These are the largest reforms in more than 10 years, and had wide reaching implications for all data collection, storage and usage activities. This paper provides an overview of the Australian Privacy Principles and comparisons to similar schemes in the United States and Europe.


UEFA Twitter Soccer Sentiment Mining, Search and Visualisation

How do you capture the feeling of a game? Twitter provides a rich set of user interactions. Our work collects and classifies tweets, using a soccer-specific Bayes Classifer, to given matches in the UEFA soccer games with multiple for analysis. Tweets were chunked and in a Solr search system, with relationships between users in a Neo4j graph database. Finally, sentiment was tied to and visualised in an interactive display for further investigation.