\n",
"\n",
"NetCDF is a commonly used file format in our field because it enables the storage of data as well as the storage of its metadata. Every major language in our field is equipped to deal with NetCDF files, and Python is no exception!\n",
"\n",
"There are four parameter types in a netCDF file:\n",
"
\n",
"
Global attributes: strings that describe the file as a whole: for example, a title, who created it, what standards it follows.
\n",
"
Variables: entities that hold data, which includes the data, the domain the data is defined on (dimensionality), and metadata about the data (for example, units).
\n",
"
Variable attributes: actual storage of the data's metadata.
\n",
"
Dimensions: not only define the domain, but also might have values of their own (for example, latitude values, longitude values, altitude values, etc.).
\n",
"
\n",
"\n",
"As an example, you might have a timeseries of surface temperature for a latitude-longitude grid. The dimensions for that dataset would be lat, lon, and time. The variable lat would just tell you the number of elements in the lat dimension; likewise for lon and time. Finally, you might have a variable containing temperatures called Ts that would be 3-D, with dimensions of lat, lon, and time.\n",
"\n",
"There are several packages that can read NetCDF files; we're going to learn SciPy today - it's not necessarily the best, but it is one of the easiest to learn, and SciPy is a useful package for many other reasons."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
10.2 Reading a NetCDF File
\n",
"\n",
"For NetCDF files, we're interested in the I/O functionality of SciPy, so we'll call that part of the package and assign it to an alias."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Import the I/O functionality of SciPy.\n",
"import numpy as np\n",
"import scipy.io as S\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To create a file object, it's actually a pretty similar approach to what we did in our I/O lecture."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"# Read in the provided NetCDF file in read-only mode.\n",
"fileobj = S.netcdf_file(\"../datasets/air.mon.mean.nc\", mode=\"r\")\n",
"print(fileobj)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"NetCDF file objects have several attributes that we can call on (more on this when we explore object-oriented programming). One of those attributes is called variables, which is a dictionary. The keys of the dictionary are strings corresponding to the names of the variables, and the values are a special kind of object called variable objects that contain the variable's values as well as any metadata (units, etc.).\n",
"\n",
"Another NetCDF file object attribute is dimensions, which is another dictionary. The keys of this dictionary are strings that are the names of the dimensions, and the values are the lengths of the dimensions.\n",
"\n",
"Let's see an example from our .nc file: a grid of monthly-mean temperature values."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"b'Monthly mean air temperature NCEP Reanalysis'\n"
]
}
],
"source": [
"# Let's import NumPy and SciPy's I/O functionality.\n",
"fileobj = S.netcdf_file(\"../datasets/air.mon.mean.nc\", mode=\"r\")\n",
"\n",
"\n",
"# Now, create a file object!\n",
"\n",
"\n",
"\n",
"# First, let's find out what information's in the title.\n",
"print(fileobj.title)\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'lon': 144, 'lat': 73, 'time': None}\n"
]
}
],
"source": [
"# Now let's explore the dimensions dictionary.\n",
"print(fileobj.dimensions)\n",
"\n",
"# Time is set to 'None' because that's this file's\n",
"# 'unlimited' dimension; you can keep adding new times to it\n",
"# and it will use the same lat/lon grid."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'lat': , 'lon': , 'time': , 'air': }\n"
]
}
],
"source": [
"# And let's see what kinds of variables are inside.\n",
"print(fileobj.variables)\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"# Okay, that's all very messy!\n",
"# Now it's time to grab the values of air temperature.\n",
"\n",
"temp = fileobj.variables[\"air\"]\n",
"print(temp)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"b'degC'\n",
"(755, 73, 144)\n",
"[[[-34.926773 -34.926773 -34.926773 ... -34.926773 -34.926773\n",
" -34.926773 ]\n",
" [-35.13935 -35.129673 -35.12742 ... -35.188705 -35.170002\n",
" -35.14935 ]\n",
" [-34.352573 -34.04226 -33.768707 ... -35.333866 -35.002903\n",
" -34.671288 ]\n",
" ...\n",
" [-16.525156 -16.404509 -16.284832 ... -16.795155 -16.737736\n",
" -16.643543 ]\n",
" [-16.190313 -16.202248 -16.21677 ... -16.132574 -16.161928\n",
" -16.178377 ]\n",
" [-17.697733 -17.697733 -17.697733 ... -17.697733 -17.697733\n",
" -17.697733 ]]\n",
"\n",
" [[-33.311375 -33.311375 -33.311375 ... -33.311375 -33.311375\n",
" -33.311375 ]\n",
" [-34.65034 -34.476204 -34.29689 ... -35.18448 -35.009308\n",
" -34.835514 ]\n",
" [-34.1031 -33.619995 -33.161373 ... -35.606552 -35.103443\n",
" -34.602757 ]\n",
" ...\n",
" [-34.338963 -34.21862 -34.08241 ... -34.359997 -34.42724\n",
" -34.418617 ]\n",
" [-33.795513 -33.896553 -33.977238 ... -33.41517 -33.56758\n",
" -33.690342 ]\n",
" [-32.942413 -32.942413 -32.942413 ... -32.942413 -32.942413\n",
" -32.942413 ]]\n",
"\n",
" [[-29.716127 -29.716127 -29.716127 ... -29.716127 -29.716127\n",
" -29.716127 ]\n",
" [-29.4471 -29.499353 -29.551613 ... -29.365162 -29.385166\n",
" -29.41258 ]\n",
" [-28.544516 -28.366776 -28.227749 ... -29.282906 -29.01323\n",
" -28.763546 ]\n",
" ...\n",
" [-51.964516 -52.206455 -52.362263 ... -50.628704 -51.18032\n",
" -51.631298 ]\n",
" [-52.846123 -53.07613 -53.290974 ... -52.069355 -52.344517\n",
" -52.60097 ]\n",
" [-54.835476 -54.835476 -54.835476 ... -54.835476 -54.835476\n",
" -54.835476 ]]\n",
"\n",
" ...\n",
"\n",
" [[ -6.84033 -6.84033 -6.84033 ... -6.84033 -6.84033\n",
" -6.84033 ]\n",
" [ -8.222664 -8.076328 -7.9316626 ... -8.658329 -8.517663\n",
" -8.366997 ]\n",
" [ -8.124661 -7.610333 -7.1173277 ... -9.681327 -9.180997\n",
" -8.653992 ]\n",
" ...\n",
" [-54.956 -55.16667 -55.29166 ... -53.67867 -54.22433\n",
" -54.649338 ]\n",
" [-55.466324 -55.73534 -55.975994 ... -54.453335 -54.82867\n",
" -55.16467 ]\n",
" [-53.225002 -53.225002 -53.225002 ... -53.225002 -53.225002\n",
" -53.225002 ]]\n",
"\n",
" [[-16.640314 -16.640314 -16.640314 ... -16.640314 -16.640314\n",
" -16.640314 ]\n",
" [-20.58 -20.478704 -20.383224 ... -20.846767 -20.765804\n",
" -20.679348 ]\n",
" [-21.077736 -20.571283 -20.056448 ... -22.434835 -22.024187\n",
" -21.567738 ]\n",
" ...\n",
" [-44.52612 -44.629353 -44.69258 ... -43.85613 -44.15871\n",
" -44.375164 ]\n",
" [-44.219357 -44.39742 -44.56355 ... -43.50774 -43.770008\n",
" -44.004192 ]\n",
" [-41.401936 -41.401936 -41.401936 ... -41.401936 -41.401936\n",
" -41.401936 ]]\n",
"\n",
" [[-26.217333 -26.217333 -26.217333 ... -26.217333 -26.217333\n",
" -26.217333 ]\n",
" [-30.25533 -30.281328 -30.300997 ... -30.091661 -30.157658\n",
" -30.210995 ]\n",
" [-31.590332 -31.514662 -31.380667 ... -31.483662 -31.579662\n",
" -31.618664 ]\n",
" ...\n",
" [-32.93566 -32.809 -32.665333 ... -33.02433 -33.059994\n",
" -33.024666 ]\n",
" [-34.10533 -34.120663 -34.118332 ... -33.930668 -34.01266\n",
" -34.066994 ]\n",
" [-33.181664 -33.181664 -33.181664 ... -33.181664 -33.181664\n",
" -33.181664 ]]]\n"
]
}
],
"source": [
"# We can now examine this data more carefully.\n",
"print(temp.units)\n",
"print(temp.shape)\n",
"print(temp[:])\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-34.926773\n"
]
}
],
"source": [
"# Let's be a little more restrained... how about all the data at the first lat/lon pair?\n",
"print(temp[0,0,0])\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"90.0\n",
"0.0\n"
]
}
],
"source": [
"# That seems pretty chilly! What are the lat/lon values?\n",
"lat = fileobj.variables[\"lat\"]\n",
"lon = fileobj.variables[\"lon\"]\n",
"\n",
"print(lat[0])\n",
"print(lon[0])\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Okay, that's reasonable for the North Pole.\n",
"# This grid is 2.5 degrees... let's find out what Seattle's weather was like!\n",
"\n",
"# Seattle is at approximately 47.5 N and 122.5 W.\n",
"\n",
"# This dataset's lon starts at 0 and counts up to 360.\n",
"# So 122.25 W corresponds to 360-122.5 = 237.5.\n",
"\n",
"# Remember, lat and lon are those weird data types:\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 90. 87.5 85. 82.5 80. 77.5 75. 72.5 70. 67.5 65. 62.5\n",
" 60. 57.5 55. 52.5 50. 47.5 45. 42.5 40. 37.5 35. 32.5\n",
" 30. 27.5 25. 22.5 20. 17.5 15. 12.5 10. 7.5 5. 2.5\n",
" 0. -2.5 -5. -7.5 -10. -12.5 -15. -17.5 -20. -22.5 -25. -27.5\n",
" -30. -32.5 -35. -37.5 -40. -42.5 -45. -47.5 -50. -52.5 -55. -57.5\n",
" -60. -62.5 -65. -67.5 -70. -72.5 -75. -77.5 -80. -82.5 -85. -87.5\n",
" -90. ]\n",
"[ 0. 2.5 5. 7.5 10. 12.5 15. 17.5 20. 22.5 25. 27.5\n",
" 30. 32.5 35. 37.5 40. 42.5 45. 47.5 50. 52.5 55. 57.5\n",
" 60. 62.5 65. 67.5 70. 72.5 75. 77.5 80. 82.5 85. 87.5\n",
" 90. 92.5 95. 97.5 100. 102.5 105. 107.5 110. 112.5 115. 117.5\n",
" 120. 122.5 125. 127.5 130. 132.5 135. 137.5 140. 142.5 145. 147.5\n",
" 150. 152.5 155. 157.5 160. 162.5 165. 167.5 170. 172.5 175. 177.5\n",
" 180. 182.5 185. 187.5 190. 192.5 195. 197.5 200. 202.5 205. 207.5\n",
" 210. 212.5 215. 217.5 220. 222.5 225. 227.5 230. 232.5 235. 237.5\n",
" 240. 242.5 245. 247.5 250. 252.5 255. 257.5 260. 262.5 265. 267.5\n",
" 270. 272.5 275. 277.5 280. 282.5 285. 287.5 290. 292.5 295. 297.5\n",
" 300. 302.5 305. 307.5 310. 312.5 315. 317.5 320. 322.5 325. 327.5\n",
" 330. 332.5 335. 337.5 340. 342.5 345. 347.5 350. 352.5 355. 357.5]\n"
]
}
],
"source": [
"# So let's save their values instead.\n",
"a = lat[:]\n",
"b = lon[:]\n",
"print(a)\n",
"print(b)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(array([17]),)\n",
"(array([95]),)\n"
]
}
],
"source": [
"# Now we can use np.where() to find the locations of the values in the dataset.\n",
"sealat = np.where(a == 47.5)\n",
"sealon = np.where(b == 237.5)\n",
"\n",
"print(sealat)\n",
"print(sealon)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1.5006493]]\n"
]
}
],
"source": [
"# Let's print it out!\n",
"print(temp[0,sealat, sealon])\n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"b'Monthly Mean Air Temperature'\n"
]
}
],
"source": [
"# If you want the full name of a particular variable, long_name is useful!\n",
"print(fileobj.variables[\"air\"].long_name)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
10.3 Writing a NetCDF File
\n",
"\n",
"Just as with normal files, we can write our own NetCDF files!\n",
"\n",
"You can create a NetCDF file object in write mode!"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 0. 0.09983342 0.19866933 0.29552021 0.38941834 0.47942554\n",
" 0.56464247 0.64421769 0.71735609 0.78332691 0.84147098 0.89120736\n",
" 0.93203909 0.96355819 0.98544973 0.99749499 0.9995736 0.99166481\n",
" 0.97384763 0.94630009]\n",
" [ 0.90929743 0.86320937 0.8084964 0.74570521 0.67546318 0.59847214\n",
" 0.51550137 0.42737988 0.33498815 0.23924933 0.14112001 0.04158066\n",
" -0.05837414 -0.15774569 -0.2555411 -0.35078323 -0.44252044 -0.52983614\n",
" -0.61185789 -0.68776616]\n",
" [-0.7568025 -0.81827711 -0.87157577 -0.91616594 -0.95160207 -0.97753012\n",
" -0.993691 -0.99992326 -0.99616461 -0.98245261 -0.95892427 -0.92581468\n",
" -0.88345466 -0.83226744 -0.77276449 -0.70554033 -0.63126664 -0.55068554\n",
" -0.46460218 -0.37387666]\n",
" [-0.2794155 -0.1821625 -0.0830894 0.0168139 0.1165492 0.21511999\n",
" 0.31154136 0.40484992 0.49411335 0.57843976 0.6569866 0.72896904\n",
" 0.79366786 0.85043662 0.8987081 0.93799998 0.96791967 0.98816823\n",
" 0.99854335 0.99894134]\n",
" [ 0.98935825 0.96988981 0.94073056 0.90217183 0.85459891 0.79848711\n",
" 0.7343971 0.66296923 0.58491719 0.50102086 0.41211849 0.31909836\n",
" 0.22288991 0.12445442 0.02477543 -0.07515112 -0.17432678 -0.27176063\n",
" -0.36647913 -0.45753589]\n",
" [-0.54402111 -0.62507065 -0.69987469 -0.76768581 -0.82782647 -0.87969576\n",
" -0.92277542 -0.95663502 -0.98093623 -0.99543625 -0.99999021 -0.99455259\n",
" -0.97917773 -0.95401925 -0.91932853 -0.87545217 -0.82282859 -0.76198358\n",
" -0.69352508 -0.61813711]\n",
" [-0.53657292 -0.44964746 -0.35822928 -0.26323179 -0.16560418 -0.0663219\n",
" 0.03362305 0.13323204 0.23150983 0.32747444 0.42016704 0.50866146\n",
" 0.59207351 0.66956976 0.74037589 0.80378443 0.85916181 0.90595474\n",
" 0.94369567 0.9720075 ]\n",
" [ 0.99060736 0.99930939 0.99802665 0.98677196 0.96565778 0.93489506\n",
" 0.89479117 0.84574683 0.78825207 0.72288135 0.65028784 0.57119687\n",
" 0.48639869 0.39674057 0.30311836 0.20646748 0.10775365 0.00796318\n",
" -0.09190685 -0.19085858]\n",
" [-0.28790332 -0.38207142 -0.47242199 -0.55805227 -0.63810668 -0.71178534\n",
" -0.77835208 -0.83714178 -0.88756703 -0.92912401 -0.96139749 -0.98406501\n",
" -0.99690007 -0.99977443 -0.99265938 -0.97562601 -0.9488445 -0.91258245\n",
" -0.86720218 -0.81315711]\n",
" [-0.75098725 -0.68131377 -0.60483282 -0.52230859 -0.43456562 -0.34248062\n",
" -0.24697366 -0.14899903 -0.04953564 0.05042269 0.14987721 0.24783421\n",
" 0.34331493 0.43536536 0.52306577 0.60553987 0.68196362 0.75157342\n",
" 0.81367374 0.8676441 ]]\n",
"42.0\n"
]
}
],
"source": [
"newfile = S.netcdf_file(\"new.nc\",mode=\"w\")\n",
"\n",
"# Let's start by putting in 10 latitude and 20 longitude values.\n",
"lat = np.arange(10)\n",
"lon = np.arange(10)\n",
"\n",
"# And maybe two different sets of data, one array and one scalar.\n",
"data1 = np.reshape(np.sin(np.arange(200)*0.1),(10,20))\n",
"data2 = 42.0\n",
"\n",
"print(data1)\n",
"print(data2)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"# So far so good! Let's create the actual dimension information.\n",
"newfile.createDimension(\"lat\",len(lat))\n",
"newfile.createDimension(\"lon\",len(lon))\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Now the names of our variables!\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# And now we assign the actual values to our variables!\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# And assign some units!\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Add a title to finish up!\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Okay, so having done all that (don't worry if all the details are unclear - this is a fairly advanced topic that will take some practice!), let's try reading our values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
10.4 NetCDF Example
\n",
"\n",
"Let's pull up some monthly mean surface air temperature data from our air.mon.mean.nc data file. These data come from the NCEP/NCAR Reanalysis 1."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"# Let's take a look at the time units.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Well, that seems confusing.\n",
"# Let's create a new version of the file where time just starts at 0.0, and change the units string so it just says 'hours'.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# And let's test it!\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
10.5 Take-Home Points
\n",
"
\n",
"
NetCDF is a powerful file type containing global attributes, variables, variable attributes, and dimensions.
\n",
"
We can read from NetCDF files using similar syntax to that for regular files.
\n",
"
Using attributes such as 'dimensions' and 'variables', we can learn about individual variables in the dataset.
\n",
"
We can also write to NetCDF files in a simlar way.