Project directory

Data-driven projects

Alejandro Agustin

alejandro.a.es@ieee.org

Barcelona, Spain

FInd me on: GitHub and Kaggle
How much will it cost me? Pre-ride regression

How much will it cost me? Pre-ride regression

The final cost of a taxi trip is usually a surprise, depending on numerous factors that can not be foreseen in advance.

Although the most important are time and distance of a route, there are many others that affect in a more indirect way, such as the traffic of a determined area, the weather, the time of the day…

In this project I have the objective of predicting the final cost of a trip considering only information you can have beforehand. Since this is a learning project I’m only going to use the data of a single month: May 2016

Bank Telemarketing Decision Support Systems

Bank Telemarketing Decision Support Systems

The data is related with direct marketing campaigns (phone calls) of an undisclosed Portuguese bank. The classification goal is to predict if the client will subscribe a term deposit (variable y).

In order to understand the dataset and achieve the proposed goal the following steps have been taken:

  • Data clean up: Outliers, imutation of missing data and detection of errors
  • Feature engineering; Mainly factor generation
  • Exploratory data analysis: PCA, MCA (HPCA)
  • Linear model generation
  • Threshold selection (ROC)
  • Solution evaluation

Presentation

ML experiments manager

For a few months I’ve been trying to train an agent able to invest on stock market without loosing money (let’s be realistic).

In previous learning project I’d choose a specific technology I was interested on and tried to apply it to something. In this case I selected a specific problem and tried to solve in the best possible way.

This creates the necessity of doing lots of tests with small variations. Over the time that translated on a modular platform:

Each experiment is fully described in a YAML file. When the platform is in demon mode, it’s continually looking for a new job by looking at a specific directory. This approach works but is not very handy, so a frontend in a form of Telegram Bot have been implemented. It not only works as a job submitter, but also enables stats query and error reporting.

bot

A job is then selected by priority and submission time.

At a high level there are four different stages:

  • Dataset fetch
  • Dataset preprocessing and feature generation
  • Model training. In the specific case of RL this has more modular stages:
    • Environment definition
    • Data preprocessing and feature generation at iteration window.
    • Agent training
  • Results generation and reporting

bot

As an example here is a YAML configuration: YAML Gist