Serendeepia Playbook

Methodology

We follow an agile methodology with some modifications for machine learning projects. The methodology enables us to perform a Continous Delivery of our products and projects.

Agile methodology

We follow the Scrum methodology where usually the Product Owner is also the Scrum master. We also use Kanban boards to track the current status of the sprint.

Our typical project is between one and three months long and we divide it in sprints of 2 weeks.

Sprints

A sprint is a timeboxed effort; that is, it is restricted to a specific duration. Each sprint starts with a sprint planning event that aims to define a sprint backlog, identify the work for the sprint, and make an estimated forecast for the sprint goal. Each sprint ends with a sprint review and sprint retrospective, that reviews progress to show to stakeholders and identify lessons learn and improvements for the next sprints.

We create milestones in GitHub as a way to represent a sprint and to prioritize the tasks.

Sprint planning

At the beginning of a sprint, the scrum team holds a sprint planning event to:

  • Mutually discuss and agree on the scope of work (SoW) that is intended to be done during that sprint.
  • Select product backlog items that can be completed in one sprint.

Sprint review and retrospective

At the sprint review, the team:

  • reviews the work that was completed and the planned work that was not completed.
  • presents the completed work to the stakeholders (a.k.a. the demo).
  • collaborates with the stakeholders on what to work next.

Three main questions are asked in the sprint retrospective:

  • What went well during the sprint?
  • What did not go well?
  • What could be improved for better productivity in the next sprint?

Kanban board

Our kanban boards have 5 columns:

  • TO DO: tasks to be done.
  • In Progress: tasks we are working right now, no one should be working in more than task at any given time.
  • Review in progress: tasks completed which require to be reviewed by other team members.
  • Review approved: tasks approved and ready to merge with the main code of the project.
  • Done: tasks completed.

Project tracking

We use Github issue tracking to define tasks and projects boards to track the status of the projects (and sprints).

Our issues or tasks can have different labels depending on their nature:

  • enhancement: new feature or request.
  • bug: something that is not working as expected and needs a fix.
  • duplicated: this task/issue was a duplicate.
  • question: more information is needed.
  • wontfix: this task/issue won’t be resolved.

The boards in GitHub are automated, this means we don’t need to move the issues from one column to another. They move automatically.

Versioning

We do not do versioning of source code but we do name every sprint release with an increasing number. In case we need to do it, we use semantic versioning (see http://semver.org/) and tag the commit as appropriate.

Data

Small datasets are under the data/ directory of the project. We use git-lfs for large files.

Big datasets can be in GS or AWS. Ideally, we synchronize the data locally to the machine when the training is going to be performed when needed.

Some guidelines about data:

  • We shouldn’t delete any data sample.
  • We should use md5 or a unique identifier for the data files when appropriate (for example for images downloaded from the Internet).
  • We should include a description file to describe the data, this file must be also in the github repository.

Communication with the client

Only one person (SPOC) will be in charge of all the communication with a client. Other members of the team must be copied in all the communication emails so they are informed.

Bots

The GitHub repositories and the documents in Google Drive are integrated in Slack in a channel named code_bots for notifications.