The largest operator of a mid-sized tanker vessel company has vast amounts of data, a tangle of information about seafarers, incidents, inspections, and maintenance activity. Through data analytics, this data could reveal insights that would help unlock untold benefits for the business.

There are two problems that fall squarely on the shoulders of the marine human resources (HR) professional and the fleet manager that this extensive data hopes to solve:

  • Firstly, fleet managers need to foresee where and when a fault would happen that could lead to either personal injury or an operational incident. This could compromise the performance of the ship, which would then be placed in the red zone/danger zone if the fault is not addressed or resolved swiftly.

  • Secondly, the marine HR professional needs to find a perfect combination of officers to be placed in the suitable vessels at the right time to stay in a safer zone throughout their voyage and minimize any risk.

These two problems are solved using SEDGE predictive analytics.


The problem can only be solved after many measures are added to the existing data and normalization has taken place. Though machine learning (ML) will perform a pattern analysis and recognition from the historical data, the shape fed to the algorithms must be well defined. This is where most of the challenges were observed.

The data has been fed from the seafarer’s personal information, audit information, as well as inspection and incident details. Due to this diversity of data from multiple entities, we created over 80 features from the data. Some are standalone features while others have combined effects, thus enabling SEDGE to learn the pattern smoothly and predict more accurately.

Creating risk buckets for officers was the most challenging part as the weight distribution for each incident was different based on location, risk actuals, and risk severity. Manipulating maintenance activities data was another challenge in order to record all the officers performing their duty without much deviation.

The greater challenge lay in statistically determining a score that should represent an officer’s historical work. A perfect unbiased score defines an officer’s activity throughout his/her tenure in the company. The score was used to predict the officer’s and vessel’s risk profile.


Machine learning was used to solve the problem inherent with the challenges and define a threshold to assign a risk to Small/Medium/High for each officer and vessel. The solution was divided into

  1. Data exploration and wrangling

  2. Data visualization

  3. Feature engineering

  4. Predictive analytics

The operational and personal incidents of each data were recorded independently.

The penalization of officers depended on their different scores earned over time, making the model learn and predict officer's profiles efficiently. Also, a lot of SQL and Transformation functions were applied to create variables needed for the model.

Data visualization:

We combined the data into a single master table after applying data wrangling techniques because the data was sourced from multiple APIs. Then, we represented the data visually to better understand the patterns and trends using multiple variables. It was observed that the risk events did not follow any trends over time.

However, we also visually explored some unexplainable patterns. For example, we explored if there was a particular day of the week on which more incidents happened. To do this, we took data spanning eight years and added day-related variables [Incident day, month, year, weekday, or weekend, and so] to the machine learning model.

An Incident can be low-risk or high-risk based on what kind of failure took place.

From data visualization, it was found that the high-risk events were minor. Subsequently, we focused more on reducing the “loss time injury” and “fatalities” by using machine learning models, as this will have a great impact over time. As per this data, we tuned and balanced the model so that the model would not be biased towards the majority class of low-risk events.

Feature Engineering:

Data was used to create over 80+ new features. We ran several machine learning models to extract the best features which defined whether an officer is a high-risk or low-risk person [Officer’s profile]. A feature extraction mechanism was applied, and the variables that played an essential role in predicting an office profile were identified. For example, safety scores, experience, age, maintenance score, and compliance scores are the top five features that decide an officer’s profile.

Once the first ML model identified the officer’s profile, the subsequent ML model provided the probabilities of the vessel’s risk. This was based on:

  • A combination of officer’s profiles

  • The vessel’s information

  • Day-related parameters

Predictive Analytics:

After careful exploration and visualization, predictive models were built for the data with a wide variety of hyperparameters. These would work for current and future delta records.

Finally, state-of-the-art ML learning algorithms were used to define an officer profile and a combination of such officers with vessel data. This was fed to the model to decide a vessel’s profile or risk probability.

Below are the features that played an influential role in predicting a vessel’s risk probability.

* Note in the above graph: CE- Chief engineer, CO-Chief Officer, Inc- Incident, Cum-Cumulative


The application assists Marine HR teams in deploying the top four ranked officers to board the vessels. In addition, SEDGE facilitates the fleet manager to proactively manage or avert any potential risk by providing leading indicators. This analytical solution from SEDGE is an industry-leading risk management solution to enhance crew safety and operational reliability.

The information or views expressed in this article is authentic to the best of our knowledge, and as such, it is prone to errors and the absence of some key information, for more information please review our Terms of Use.