We follow an agile methodology with some modifications for machine learning projects. The methodology enables us to perform a Continous Delivery of our products and projects.
Our typical project is between one and three months long and we divide it in sprints of 2 weeks.
A sprint is a timeboxed effort; that is, it is restricted to a specific duration. Each sprint starts with a sprint planning event that aims to define a sprint backlog, identify the work for the sprint, and make an estimated forecast for the sprint goal. Each sprint ends with a sprint review and sprint retrospective, that reviews progress to show to stakeholders and identify lessons learn and improvements for the next sprints.
We create milestones in GitHub as a way to represent a sprint and to prioritize the tasks.
At the beginning of a sprint, the scrum team holds a sprint planning event to:
At the sprint review, the team:
Three main questions are asked in the sprint retrospective:
Our kanban boards have 5 columns:
TO DO: tasks to be done.
In Progress: tasks we are working right now, no one should be working in more than task at any given time.
Review in progress: tasks completed which require to be reviewed by other team members.
Review approved: tasks approved and ready to merge with the main code of the project.
Done: tasks completed.
Our issues or tasks can have different labels depending on their nature:
enhancement: new feature or request.
bug: something that is not working as expected and needs a fix.
duplicated: this task/issue was a duplicate.
question: more information is needed.
wontfix: this task/issue won’t be resolved.
The boards in GitHub are automated, this means we don’t need to move the issues from one column to another. They move automatically.
We do not do versioning of source code but we do name every sprint release with an increasing number. In case we need to do it, we use semantic versioning (see http://semver.org/) and tag the commit as appropriate.
Small datasets are under the
data/ directory of the project. We use git-lfs for large files.
Big datasets can be in GS or AWS. Ideally, we synchronize the data locally to the machine when the training is going to be performed when needed.
Some guidelines about data:
Only one person (SPOC) will be in charge of all the communication with a client. Other members of the team must be copied in all the communication emails so they are informed.
The GitHub repositories and the documents in Google Drive are integrated in Slack in a channel named
code_bots for notifications.