This article is part of our educational series about weather science: What it is, what it does, and why we need it.
What comes to your mind first when you think of "big data"? Finance, maybe? Healthcare? Entertainment? Gaming must be generating huge volumes of data, right?
What about the weather? We tend to take our daily weather forecast for granted – because, well, how difficult can it be to predict rain or shine a couple of days ahead? It doesn't sound like rocket science - but it kind of is.
To grasp the complexity of weather data, we will put it in the context of Big Data, which is generally defined by five characteristics – or the 5 Vs of Big Data: Volume, Velocity, Variety, Veracity, and Value.
Let’s go through each one to see how weather data is the epitome of Big Data.
Volume
The European Centre for Medium-Range Weather Forecasts (ECMWF), one of the key organizations providing weather data, generates 287 terabytes of data every single day. That's almost six times more than over 2 billion gamers globally (50 TB/day). If we add other major organizations (such as the U.S. National Oceanic and Atmospheric Administration (NOAA), weather data is quickly approaching exascale.
So why exactly is weather data so big? Weather is measured on the surface of the Earth (over both land and water), within the atmosphere, and from space. Modern technology allows for it to be measured across the planet from pole to pole, 24/7, 365 days a year.
These datasets, such as surface sensors, radars, and weather satellites, offer important situational awareness and historical context and are the backbone of weather forecasting technology. Hundreds of terabytes of data are generated daily by these systems and processed in real time.
One of the primary uses of this data is the creation of numerical weather prediction models, or NWPs, which are the foundation of any weather forecast. These models digitally represent the entire Earth-Ocean-Atmosphere (EOA) system, essentially aiming to create a digital twin. They then literally forecast the future using a combination of physics, dynamics, mathematics, and computer science.
Models calculate numerous equations representing the EOA system in 4 dimensions - horizontal (twice!), vertical, and across time into the future. We are talking about billions of calculations creating ENORMOUS datasets.
Velocity
To process such an enormous amount of data, the weather industry uses tremendous computational resources (read: computers). A little-known fact: Weather is one of the original applications of Big Data technologies and the reason many supercomputers exist.
Think about it: Weather literally cannot wait. To make use of the forecast, we need to pull all this data together and run the models very quickly, generating the latest information and making it available for decisions about approaching weather impacts. The slightest delay and the forecast risks becoming useless.
The same is true for the observation systems themselves. All of this before we even involve technologies like machine learning, deep learning, or generative AI. Each of these systems also needs to be run as quickly as possible to be useful.
Apart from the actual process of forecasting, weather data must also be delivered quickly using modern, responsive technologies. In some cases, we need technology like webhooks to tell us when to pay attention. Rapidly-developing severe weather - like hail, lightning, and tornado potential - that pose a danger are critical examples of data that must be delivered immediately.
Variety
As alluded to earlier, weather is measured in a variety of ways, by a variety of measurement devices, in a variety of formats, across a variety of time horizons. Just like the weather itself, its measurement is abundant in variety.
There are two types of weather data measurement based on the data source: in-situ and remote. In-situ data describes the atmosphere right where the sensor (e.g. weather sensor or radiosonde) is located. Remote data - satellites and radar - measures the atmosphere away from the sensor. This creates a variety of data formats, projections, and data types. All of this data is then translated and normalized through a process called data curation.
After curation, this data becomes an input for a variety of forecast models, based on scale of motion, range, and/or region (global, regional, or local). The sheer variety and volume of data sources, formats, and types is a data engineer’s dream (or nightmare) - and with new data types emerging all the time, data curation is a moving target.
Veracity
Weather forecasting is an imperfect science. This likely isn’t surprising, but thankfully, we (weather scientists) know where these imperfections lie and continue to work on improvements.
A 5-day forecast today is every bit as accurate as a 3-day forecast only 10 years ago. And while imperfect, it’s still one of humanity’s greatest achievements. But to keep raising the bar we need better observations, a better understanding of the science, and access to increasingly stronger and better computational resources.
Statistical post-processing techniques, and more recently, Artificial Intelligence, have allowed us to push weather forecasting's veracity - aka accuracy - further. Such technology allows us to introduce recent and localized data – and this is how we begin to close the gap in forecast error. If you truly want to forecast the conditions in your backyard, you must first measure that local microclimate and then incorporate that information into a modeling system. It may not be easy, but it will be worth it!
Value
It should be clear by now that weather forecasting takes a lot of work – is it even worth it? And why do we need more, better data if it takes so much effort? Well, simply put – weather impacts everyone on the planet. Weather is one of the keys to the protection of life and property. It’s also a crucial component of the global economy, benefiting trillions of dollars in business across the globe.
A recent publication in Science journal demonstrates just how impactful weather can be. The authors quantified the economic impact of the two strongest, most recent El Nino events (’82-’83 and ’97-’98), showing an estimated $4T-$5T in global income losses.
Energy. Agriculture. Supply chain. Transportation. Operations. Insurance. Each of these requires reliable weather data. The current list of applications for weather data is large, and will only keep growing - we literally can't fathom every future weather data use case; the need is so vast.
It's no wonder that weather remains one of the biggest data challenges we face. Weather science is about more than just daily forecasts. It's about building resilient, safe, and sustainable societies globally. The weather business needs a new standard. The stakes are too high for Earth to continue to base crucial decisions on “good enough" data - especially when great data can save energy, resources, and ultimately, lives.
So, where are we heading with all of this data?
At Xweather we believe world-class weather and environmental solutions require world-class data and insight.
We collect data from hundreds of thousands of sources across the globe. We process more than 5 PETABYTES (5 million GBs) of data annually. We enhance already-impressive public data with own data and models for industries and applications such as air quality, transportation, and energy. We pull data from proprietary sensor networks and non-weather data sources (such as connected vehicles, energy production information, and other industry-specific data). It's a lot - so what can we do with it?
Even without considering future needs, this question already has unquantifiable answers. From sourcing an optimal location for your next energy farm to creating plans to respond to the ongoing climate crisis - we can't wait to hear yours!
Who creates weather forecasts? What’s the history of weather science? How can we leverage weather data for the good of the planet? Why improve the quality of weather data? Stay tuned for more insightful content as part of our ongoing educational series on weather and weather science.