\n",
"\n",
"Masked arrays are just like normal arrays, except that they have a \"mask\" attribute to tell you which elements are bad.\n",
"\n",
"Recall how arrays normally work:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1 2 3]\n",
" [4 5 6]]\n"
]
}
],
"source": [
"# Let's create a 2D array that contains the numbers 1-6.\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"a = np.array([[1,2,3],[4,5,6]])\n",
"print(a)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we have some information that maybe the last two values are suspicious and may consist of bad data, we can create a mask of bad values that will travel with the array. Elements in the array whose mask value corresponds to \"bad\" are treated as if they did not exist, and operations using the array automatically consider that mask of bad values.\n",
"\n",
"This is extremely useful! Sometimes we have a dataset that's read-only, or we want to be aware of precisely which data are suspect, so instead of deleting them, we just keep all information and have a flag on which values are bad.\n",
"\n",
"For this purpose, NumPy has a function called numpy.ma."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1 2 3]\n",
" [4 -- --]]\n",
"[[False False False]\n",
" [False True True]]\n",
"[[1 2 3]\n",
" [4 5 6]]\n"
]
}
],
"source": [
"import numpy.ma as ma\n",
" # This saves us having to type 'np.' at the start of every instance of numpy.ma.\n",
"\n",
"a = np.array([[1,2,3],[4,5,6]])\n",
"b = ma.masked_greater(a,4)\n",
"\n",
"print(b)\n",
"# Let's set our mask to everything greater than 4.\n",
"print(b.mask)\n",
"print(b.data)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[3 6 9]\n",
" [12 -- --]]\n"
]
}
],
"source": [
"# Now, if we try to do an operation on our masked array:\n",
"print(b*3)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When we have a masked array, any operations applied to elements whose mask value is set to True will create a resulting array that also has the corresponding elements' mask values set to True. Masked arrays thus transparently deal with missing data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
15.2 Constructing and Deconstructing Masked Arrays
\n",
"\n",
"There are several different ways to construct a masked array; we saw one example above, but (as always!) Python provides us with options.\n",
"\n",
"We can explicitly specify a mask!"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[-- -- 3]\n",
"[1 2 3]\n",
"[ True True False]\n"
]
}
],
"source": [
"a = ma.masked_array(data=[1,2,3],mask=[True,True,False])\n",
"print(a)\n",
"print(a.data)\n",
"print(a.mask)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A lot of the time, we'll determine whether or not data values should be masked on the basis of some logical test (e.g., whether data values are beyond an acceptable value - like negative rainfall amounts!).\n",
"\n",
"We can make a masked array by masking values based on conditions! This can be done with some specific functions like numpy.ma.masked_greater() and numpy.ma.masked_where()."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1 2 3 -- --]\n"
]
}
],
"source": [
"# Mask all values greater than 3.\n",
"data = np.array([1,2,3,4,5])\n",
"a = ma.masked_greater(data,3)\n",
"print(a)\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1 2 -- -- 5]\n"
]
}
],
"source": [
"# Mask all values greater than 2 and less than 5.\n",
"b = ma.masked_where(np.logical_and(data>2,data<5),data)\n",
"print(b)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sometimes we might want to export our results to a file that doesn't support object attributes (for example, a text or comma-separated value file). In those cases, it makes sense to replace masked values with some value that we know is nonsense, which we can do using numpy.ma.filled()."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[-- -- 3.0]\n",
"[-1.e+23 -1.e+23 3.e+00]\n"
]
}
],
"source": [
"c = ma.masked_array(data=[1.,2.,3.],mask=[True,True,False],fill_value=-1e+23)\n",
"print(c)\n",
"\n",
"d = ma.filled(c)\n",
"print(d)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"