project

Class Project: Data Centres in Europe

Overview

In this project you will (1) build a Europe-wide dataset of data centre locations, (2) compile socio-economic and infrastructure variables at the NUTS3 level, and (3) train models to predict where data centres are located. The goal is to practice end-to-end data science: data collection, cleaning, standardisation, merging, modelling, evaluation, and interpretation.

Key Dates

Milestone Date
Project start 1 March
Mid-term Deadline (Parts 1–2) 20 April
Part 3 begins 20 April
Report Deadline June 5th or two weeks before “Primo Appello”

Part 1 (1 March → 20 April): Data Centre Collection

Task

Collect data on as many data centres in a set of nations as possible, and submit your entries to the TA, who will collate a Data Centre Master Table. Each team selects the set of nations equal to: (team number mod #nations_sets).

Initial sources of data centers information include Data Center Map. Other sources are also encouraged.

Definition

A “data centre” is a dedicated facility that provides computing/storage/network infrastructure (e.g., hyperscale, colocation, carrier-neutral sites). Generic office server rooms should not be included unless clearly documented as a data centre.

What to submit

Grading (Part 1)

Credit is proportional to the number of correct entries submitted via this form. Incorrect entries will be penalized.

Part 2 (1 March → 20 April): NUTS3 Variables (EU-wide)

Task

For all EU NUTS3 regions, compile a dataset containing:

If the remainder is 0, select type 14. Example: team 16 → 16 mod 14 = 2 → chooses type 2: Wind potential.

Deliverable

A clean table keyed by NUTS3 code, with clear units, time period, sources, and brief notes on processing and coverage. See “NUTS3 Additional Variables” below.

Part 3 (20 April →): Train Models to Predict where Data Centers are Located

NUTS3 Additional Variables (#nuts_vars)

Each team has a number. In Part 2, point 3, you have to select the variable type equal to (team number mod 14). For example, team 15 will choose variable type 2: “Wind potential”.

Assigned Sets of States (#nations_sets)

What is NUTS3

NUTS3 (Nomenclature of Territorial Units for Statistics, level 3, more here) is the third level of the hierarchical regional classification developed by Eurostat for the European Union. It provides a standardized system to subdivide countries into comparable small regions for statistical analysis, regional policy, and socio-economic reporting.

In this project, we use the NUTS2024 classification at the NUTS3 level for EU Member States. In addition, we include Statistical Regions (SR) for non-EU countries that are harmonized with the NUTS framework.

To identify the corresponding NUTS3 region for a given geographic location, Eurostat provides an official interactive tool (Statistical Atlas). By entering or navigating to a specific position on the map, the associated NUTS3 region and code can be retrieved.

How to get NUTS3 from coordinates in python

See this doc.

Data Center Mandatory Variables

Data Center Optional Variables

Appendix — NUTS3 datasets (selected sources)

1. Socio-demographic and economic

2. Energy, renewables, environment

Some sources are not directly in NUTS3 but can be aggregated (e.g., raster means within NUTS3 polygons).

3. Digital infrastructure

4. Climate risk

5. Water security

6. Other useful variables