Sept. 11, 2023

Preventing third party incidents with a data-driven tool

The case of ProRail

BigData Republic

Data Science & Data Engineering

Prorail is a Dutch government organization responsible for managing and maintaining the Dutch railway network, which spans 7,000 kilometers. Their tasks include ensuring safety, expanding the network, and handling infrastructure renewal. Prorail aims to accommodate the expected increase in passengers by boosting passenger capacity by 30%, compared to 2018.

To achieve these goals, Prorail needs to minimize unsafe situations and delays by actively preventing incidents. A significant portion of these incidents are caused by third-party disruptions to the railway infrastructure. Examples include people walking their dogs, people taking shortcuts, acts of vandalism, and suicide attempts. Addressing and minimizing these disruptions will improve the overall functioning and reliability of the railway network.

Currently, a team from the Incident Response department goes out daily to address these incidents. This team consists of approximately 200 full-time employees. While the organization primarily operated as a reactive organization in recent years, the switch is now being made to an organization with a focus on prevention. To optimize resources, it is important to work in a data-driven manner.

Bringing us to Prorail’s challenge:

“How can we optimize the use of ICB (the Incident Response Team) based on practical knowledge and experience to minimize the number of incidents in a data driven way?”

Team composition

To answer this question, a project team has been established. This team consists of two machine learning engineers from BigData Republic (formerly Vantage AI), a data scientist, a scrum master, a product owner, a software engineer, and multiple stakeholders from ProRail.

Minimizing railway incidents with data

The goal of this project is to support the incident response team in minimizing the number of incidents around the railway. In the past, a reactive approach was used. In the future, the incident response team should be deployed in a proactive manner using data. Data utilization has been minimal so far. All in all a new way of working needs to be developed.

To achieve this new data driven way of working, the team needs to be trained in analyzing and utilizing data. A good first step is to work with dashboards. Additionally, data management needs to be in order. Finally, to extract more value from the data, an accurate predictive model is essential to make even better informed decisions.


So how to achieve this?

Firstly, we started by thoroughly understanding the use case and ensuring alignment among the entire team and all stakeholders regarding the deliverables. In terms of organization, we visited with the incident response team to observe and understand their actions as best as possible. During one of these days, we came across a training exercise, and fortunately, we were able to participate. We even had the opportunity to cut a door off a car. That's not something you experience every day!

We delved into various data sources and familiarized ourselves with ProRail's data infrastructure and tooling. We established fully automated and robust ETL pipelines using Azure Kubernetes, Docker, and SQL databases, and developed multiple insightful PowerBI dashboards. We shared them iteratively with stakeholders during triweekly feedback sessions. Furthermore, we experimented with several data sources and trained various models, accelerated by AutoML capabilities within the AzureML platform.

In order to extract more information from incident descriptions, we performed text mining on the description of historical incidents with diverse Natural Language Processing (NLP) methods.

To maintain effective communication with other departments within the organization, we participated in knowledge-sharing sessions with other teams on a three-week basis.

Using the Scrum method

We incorporated several scrum routines into our workflow with the team:

  • Daily: Stand-up meeting
  • Weekly: Refinement
  • Every 3 weeks: Planning session, review, stakeholder meeting

Challenges & learnings

To sum up, here are some of the challenges we've faced as well as the valuable insights and learnings we've gained along the way:

  • At the beginning, it was not clear what the deliverables were. Ask probing questions during an intake. Don't proceed until you understand the answer and everything is clear to you. Ensure that everyone is on the same side at the beginning of the project.
  • There were multiple teams involved from different parts of the organization. Those teams had different opinions on which tools should be used and what the exact deliverables of the project should be. It is important to align with the product owner and stakeholder on this matter, especially to get started quickly.
  • In a large organization, acquiring data takes time and getting assistance from other teams is necessary. Make clear agreements on who delivers what and when. Check periodically if it has been completed.
  • It is important to establish agreements on maintenance of the data product. Inquire about this in a timely manner. Don't be afraid to escalate if necessary.
  • Avoid dependence on a single team member. Share responsibilities, document (for example in Atlassian), and perform code reviews
  • Do not underestimate the process of taking a project to production. Often, there is dependency on other teams and lead time. Allocate enough time for this in a project.
  • The challenge of working in a (large) team is that people focus on their own tasks, leading to silos. Code reviews can help mitigate this.
  • You can lose a lot of time in creating features. Aim for: "done is better than perfect" (avoid over-optimization features).

Results and Impact

We delivered multiple insightful PowerBI dashboards with over 70 unique users and delivered fully automated ETL processes resulting in a high quality database. The dashboards are used weekly for the ICB planning.

We experimented with machine learning models by predicting and analyzing the incidents.

Everything is documented and the project has been handed over to a new maintenance project team. This team handles ad hoc requests and technical complications for the project.

In the future, another team within ProRail will continue to develop and productionize a ML model within ProRail’s infrastructure. This model will also be integrated with ICB’s planning tool to further steer the data driven way of working.

Added value

This project to minimize and prevent third-party incidents in Prorail's Dutch railway network has provided valuable insights and tangible improvements. The development of a dashboard visualizing incident patterns and trends has optimized the planning and deployment of personnel for the incident response team. Despite challenges, such as establishing clear deliverables and coordinating with multiple teams, the project has made a positive impact. By continuing to iterate and embrace a proactive, data-driven approach, Prorail can enhance the safety and reliability of the railway network.