top of page

AI for smart ports, part 1: Limitations of existing data sources for port call prediction

Port call optimization may be realized in practice for example as more efficient planning of the use of port resources such as berths, cargo handling equipment, storage, and human resources. A fundamental requirement in such planning is the capability of accurately predicting when ships will arrive to the port sufficiently in advance to allow for efficient resource allocation. However, accurate prediction of vessel schedules and efficient automated use of such predictions is today a challenge for many port logistics actors. In this article we’ll outline some of the reasons for this, while later in part 2 of this article we will demonstrate how machine learning methods can help in solving the related challenges.

At Awake.AI, we are developing a software platform for smart ports, enabling holistic situational understanding of the current and predicted states of port operations, resources, and related logistics. Automated monitoring and prediction of vehicle and cargo movements is necessary for comprehensive port call optimization, which will yield both environmental and economical benefits.

In the following we will take a quantitative look at the quality of data sources typically available for actors in maritime logistics chains for the purpose of monitoring and predicting vessel schedules. Specifically, we focus on public AIS data and examples of port call data from official port community information systems.

AIS Data overview

Transmission of AIS (Automatic Identification System) data over VHF radio frequencies is mandatory under the SOLAS convention (The International Convention for the Safety of Life at Sea, Chapter V, Regulation 19) for all passenger vessels and cargo ships of 300 gross tonnage or more (with limited exceptions), and fishing vessels of more than 15 meters in length. In addition to dynamic information on the current status of a vessel such as position, course over ground, and speed over ground, AIS messages contain voyage related information such as the vessel’s destination and estimated time of arrival (ETA). There are many services which collect and distribute AIS messages from vessels around the world with high frequency and relatively low latency. Thus, in principle, AIS data is an ideal source of information for port call prediction. However, in practice, the manual nature of AIS message configuration often leads to poorly structured data and reduced usefulness, as considered below.

In the following, we look at European AIS data collected in January-March 2020. For this overview, we focus especially on messages from vessels reporting as cargo carriers, including general cargo vessels and tankers (i.e. AIS ship types from 70 to 89), and passenger vessels (ship types 60–69). The AIS message sequences are divided into segments corresponding to alternating voyages and port calls. This allows us to investigate in detail the related AIS data characteristics. The figures below illustrate the distribution of port calls in the dataset, consisting of approximately 900 000 port calls in total for all vessel types.

AIS data quality challenges for port call prediction

Missing destination information

Taking a closer look at the AIS data related to cargo and passenger vessel port calls in the dataset visualized above (in total approximately 640 000 port calls), we can see a fundamental challenge for AIS-based monitoring and prediction of vessel activities. If we sample the AIS ‘destination’ field for each vessel on arrival to port, we find that over all of Europe, on average in approximately 10 % of cargo vessel port calls and 20 % of passenger vessel port calls, no destination is reported. Also, going farther back in time before arrival to port, it is less and less likely for any given ship to report any destination information through AIS. The availability of AIS destination information also seems to vary between destination countries, as illustrated in the diagrams below showing the relative frequencies of missing destination information for port calls by country for cargo vessels (including tankers) and passenger vessels.

Poor quality of existing destination information

The analysis outlined above shows that at best, for 90 % of European port calls, something is reported in the AIS ‘destination’ field. However, there is unfortunately no guarantee that this information is useful for monitoring and predicting vessels’ destinations. As an example, the graph below shows the AIS destination reports for port calls in Hamburg, Germany. While in almost 50 % of port calls the message is the informative string ‘HAMBURG’, in 30 % of cases the destination is missing completely (labeled here as ‘unknown’), and many of the reported destinations are simply incorrect. Only approximately 10 % of messages consist of the official locode ‘DEHAM’. This example also illustrates the long tail of variably useful strings given as destinations. While various algorithms can be applied to parse these messages to more meaningful information, there are still unfortunately many cases where it is not possible to reliably deduce the actual destination of a vessel from a single AIS message.

Poor quality of ETA information

AIS messages also contain a field for estimated time of arrival (ETA). It seems reasonable to assume that the best estimate for a ship’s arrival time would be available from the crew and navigation systems onboard, and the AIS system is a good channel for sharing such information, which would ultimately provide benefits for many actors in maritime logistics. However, this opportunity seems to be currently ignored by most vessel operators.

The diagram below illustrates the distribution of error (in hours) in the ETA information (when available) of AIS messages in our example dataset, again on arrival to port, when the arrival time is already known onboard a vessel with high accuracy. While the provided estimates are somewhat more accurate for passenger ships, even for those only around 10 % of arrival time estimates are within ±1.5 hours of arrival. Note that the shown distributions have very long tails, i.e. the error in ETA estimates can be very large - the shown range is here limited to ±72 hours. This example also shows that ships tend to arrive late of the estimated schedules more often than early, as the ETA error distributions are skewed towards negative values (error is here defined as the time interval from true arrival time to ETA).

Official port call data

As defined by the UN Trade Facilitation Implementation Guide, port community systems are maintained by major ports or national authorities for the exchange of information between clients and national Customs and other authorities. Port Community Systems are a form of Single Windows for Trade, and are similar to Airport Community Systems. A Port Community System handles electronic communication in ports between the private transport operators (shipping lines, agents, freight forwarders, stevedores, terminals, depots), the private hinterland (pre- and on-carriage by road, rail and inland waterways), the importers and exporters, the port authorities, Customs and other authorities.

As such, port community systems should be tailor-made as a source of reliable data for port call optimization and planning. To consider this, we look at an example dataset of port call reports communicated to the Finnish Portnet system between July 2019 and February 2020. The figure below, showing the frequency of port call ETA updates as a function of the time to arrival, illustrates one characteristic of the sampled port community system data challenging for port call optimization: for non-passenger vessels, the majority of port call announcements are submitted a relatively short time before the vessel arrives to port.

The second problem with official non-passenger vessel port call notifications is that although they are provided relatively close to arrival, the related ETA estimates are still not very accurate. In the figure below we illustrate the accuracy of ETA (estimated time of arrival) information provided to official Finnish port call systems five days or less before arrival to port. The distributions of ETA error are again shown separately for passenger vessels and all other vessels reporting port calls. While the port call information is generally accurate for passenger vessels, the error in reported time of arrival is frequently several hours for other vessel types.

As vessel traffic characteristics vary quite significantly between ports, there is variation also in the quality of port call information provided for arrivals to different ports. The figure below illustrates the error distributions in official port call ETA information for seven major ports in Finland. Port 5 receives a lot of passenger vessels, while in port 1 there is a lot of bulk cargo traffic, and there is a notable difference in the port call ETA reliability between these ports. For a cargo port, several hours’ unreliability in vessels’ arrival times results in sub-optimal planning of resource allocations.


In this article, we have outlined some of the fundamental problems today limiting accurate port call prediction and related planning of port operations. The considered problems arise from poor data quality and could be remedied by improving the usage of available technologies and information channels by ship operators and agents. However, as such systematic improvement is unlikely to happen quickly, to achieve improvements today it is necessary to solve the related problems by data processing and analysis. In part 2 of this article, we will demonstrate how machine learning methods can be applied to significantly improve the predictability of vessel schedules based on existing data sources.


bottom of page