Is there such a thing as too much data?

March 11, 2021
Words by Álvaro García Faura

Having a huge amount of data is great. The problem is that there is such a thing as too much data, especially since not all information is equally important. In some cases, to be able to store and process all those data with the available resources, it’s vital to remove some of them and, of course, we would like to do it without data losing their value. How to do that?

At XLAB, we are passionate about providing data analytics solutions in various fields, one of them being the maritime domain. Imagine the situation in large ports: hundreds of vessels coming in every day and each of these vessels continuously transmits data every few seconds. Can you even grasp how much data that amounts to?

So, we took on the challenge of eliminating redundant data without losing relevant information, using a concrete example. Explore the results of our case study.


AIS Trajectory Compression: A case study in the Port of Piraeus


⚓ What are AIS data and what are they used for

The Automatic Identification System (AIS) is a tracking system originally meant for vessel collision avoidance. Enforced by the International Maritime Organization (IMO), AIS transceivers must be installed (and operational!) in any ship with a gross tonnage of 300 or more, as well as all passenger vessels no matter their size.

AIS data are composed of several fields, some of them describing the ship itself and its current voyage (e.g., the name of the vessel, its dimensions, its current destination, …), and some others including dynamic information such as speed and course over ground.

In spite of having collision avoidance as the original purpose, AIS data have proved valuable for many other applications, and the industry around it keeps growing every day. For instance, AIS data can be used to ease search and rescue activities, to enhance maritime security, to monitor fishing in national coasts, to keep track of cargo, or to optimize fleet management and port operations.

⚓ The problem

AIS data are transmitted by vessels every 2 to 10 seconds depending on their speed, and every 3 minutes when they are at anchor. This means a vessel could send up to 1800 AIS messages every hour. Try to imagine the amount of data that are generated around large ports all over the world, each of them getting hundreds of vessels every day. Multiply this by all the days in a year, and you definitely end up with a vast quantity of data, which is great!

AIS messages (white dots) transmitted in the Saronic Gulf, where the Port of Piraeus is located, during a single day in July 2020.

Of course, it is great to have lots of data, but in order to provide valuable insights from them, they have to be processed and probably stored somewhere. Now, everybody will agree that probably some of these data points are not as valuable as others. For instance, picture a vessel moving in an almost straight line at constant speed for several hours. It is transmitting an AIS message every few seconds, but is it really necessary to store and process all those messages? Wouldn’t it be equivalent to deal with a new message if the trajectory or the speed changes substantially? Yes, it would! And we would have saved considerable storage and computing resources. Furthermore, the same principle could be applied to a vessel that is anchored or moored and keeps transmitting AIS messages without moving.

⚓ Our approach

So, as we mentioned, the key point is to remove data without losing information. Let’s dive into our approach by taking a single vessel trajectory as an example. The trajectory that a vessel follows between departure from Port A and arrival to Port B could be simplified to a maximum extent by simply writing down the time when it left Port A and the arrival time to Port B. One would draw a straight line between these two ports and that would be our compressed trajectory.

The trajectory of a vessel departing from the port of Piraeus. Blue dots denote the first AIS messages captured, red dots the last ones.

Yes, it is obvious that this would most likely suppose a huge error in terms of the actual path that the vessel followed. What is important to notice, though, is that we are not only losing that information, but also about other variables such as its speed or heading at different stages of the voyage. We can’t know if the first few miles of the voyage took actually half the complete travel time, for instance. In most of the use cases, like when measuring the time needed to enter or leave a port, the location information is not valuable if the temporal information is lost.

With all this in mind, at XLAB we decided to challenge ourselves: how many AIS messages could we discard without losing any relevant information? We gathered AIS data from a large area around the Port of Piraeus for the whole July 2020. Then, we implemented, evaluated and compared several algorithms for trajectory compression. Keep reading to discover some of our findings.

⚓ Results and benefits

Since an image is worth a thousand words and at this stage you must be left thinking how this all actually looks like, let’s have a look! We include below an example for a vessel departing from Piraeus and the trajectory it followed for two days.

Comparison of original (upper) and compressed (bottom) trajectories for a vessel departing from the port of Piraeus.

We can clearly see how the points that were not providing much information were automatically removed, but the spatial and temporal information (even if the latter can’t be seen here) are kept. But that’s not all. If we zoom in to the port, we can see that the vessel was already transmitting AIS data while still moored. Do we need all these messages? Probably not.

Comparison of original (upper) and compressed (bottom) trajectories for the same vessel in the previous example while still moored before departure.

Of course, this is just an example, but it illustrates perfectly what the tools developed by XLAB are capable of. The algorithms can be tweaked to achieve a larger or smaller compression, depending on the specific requirements and constraints of the use case. For instance, in the above case, the average speed error was only 0.05 knots, and we didn’t deviate more than 25 meters (0.015 miles) from the original trajectory!

When carrying out the evaluation using the data for the whole July 2020 around Piraeus, we managed to achieve compression rates up to 90 % again without exceeding the deviation limit of 25 m and a maximum speed difference of 0.25 knots. At this point, there’s no doubt these results translate to real savings in storage and increased processing efficiency, and this while still keeping all the information needed.

⚓ What else can you do with AIS data?

Trajectory compression is just one of several solutions we are working on at XLAB regarding the maritime domain. We are committed to provide advanced data analytics solutions to optimize operations and provide valuable insights in aspects such as Estimated Time of Arrival (ETA), Estimated Time of Departure (ETD), navigational status correction, insights regarding port operations, and many others. Want to know more? Don’t hesitate to reach us at


»Part of this work was supported by the European Commission under the project PIXEL leading the digitalisation of the port industry (under grant agreement number 769355).«

Social media

Keep up with what we do on our social media.

Keep up with XLAB on Facebook. Keep up with XLAB on Facebook.
Keep up with XLAB on Facebook. Keep up with XLAB on Facebook.
Keep up with XLAB on Facebook. Keep up with XLAB on Facebook.
Keep up with XLAB on Facebook. Keep up with XLAB on Facebook.