Email Parsing and extracting important information using Deep Learning Navig8


Brief about Client


The Client Navig8 Group is an experienced Ship owner and also one of the largest independent pool operator and commercial management company. The client is active in chartering tonnages and offering employment option for the tonnages of ship owners. As part of its drive towards optimizing the value of information flow to improve decision-making, the client embarked on a project to extract data intelligence from the flow of information from various Chartering brokers.


The challenge


On a daily basis the client would receive more than 1200 emails from Chartering brokers, who would provide the information about the Oil quantity, charterer, rates, availability of tonnages, loading port, discharge port and the laycan days. The flow of information was in various formats, such as text in the email body, data in pdf, excel, word and html pages. The information received would range unstructured or structured format. The objective was to extract the key information on a daily basis and provide the clean parsed data in a tabular format with a quality index. The challenge was that the fixture mails sent by brokers did not use any standard formats and also did not have any data standards.


The solution


The team at SVM Analytics, created an email reading platform, and identified fixture mails and excluded junk mails. The email data which flowed from the mails did not have any structure and the values of each variables had to be learnt based on the broker, as there was no data standard. These fixture mails were cleaned up and then run through Deep learning algorithms and a model was created, which would identify the key information for each variable. The model matured after running it with large amount of email data, and over 3 months period, read data with an accuracy of 95%. The extracted tabular data was dispatched back to client on a daily basis through API. The challenge of the project was the mails being received in various unstructured formats and with no common standards, and the deep learning algorithm had to be complemented with heuristics to achieve accuracy.