SIP_note
A brief note for the class Signal and Image Processing includes most of the class.
Signal and Image Processing notes
Intro
We are dealing with signals, an image can also be considered a kind of signal.
Image and signal basics and terminology
a digital image is an n-dimensional array of quantities, if n = 2 then it's a pixel.
The resolution is given by the number of pixels in the image array
Color and spectral measurements
color can be represented in RGB(Red Greed Blue) and HSV(Hue Saturation Value)
Convert RGB to gray scale: Gray = 0.299R + 0.587G + 0.114B
Pixel-wise Operations, Intensity Transformations, Image Formation, and the Convolution Integral
Image basics
JPEG: unsigned 8-bit int(lossy)
PNG: 1,2,4,8,16 bit unsigned int
TIFF: 1-32 bit unsigned int
Basic pixel-wise operations
Adding or subtracting a constant: J(r,c) = I(r,c)+C
Multiplication or division by a constant: J(r,c) = C*I(r,c)
Adding or subtracting images (pixel-wise): J(r, c) = I A (r, c) + I B (r, c)
Multiplication or division of images (pixel-wise): J(r, c) = I A (r, c)⋅ I B (r, c)
Be careful to handle over- and underflow! Usually, clip the values to the quantization range.
Pixel-wise intensity transforms
The Dynamic range is the actual range of intensity values [I min, I max] present in the image.
We can change the dynamic range of an image by applying functions on intensity values.
Logarithmic transform of intensities:
\(\begin{aligned} & J(x, y)=c \ln \left[1+\left(e^\alpha-1\right) I(x, y)\right] \alpha>0\end{aligned}\)
用于查看图片暗处的细节,原理是把暗处intensity拉伸,同时亮处的intensity不怎么变
Exponential transform of intensities:
\(\begin{aligned} J(x, y) & =c\left[(1+\alpha)^{I(x, y)}-1\right]\alpha >0 .\end{aligned}\)
用于查看亮处细节,原理同上
Power-law (Gamma) transform
\(J(x, y)=c[I(x, y)]^\gamma\)
可以对动态范围做调整
bit-plane slicing
把图片每个像素点分解成8层后做逻辑和操作获得八张图片
例如10010110,每层都拿一个对应的数,如第八层拿第一个1,以此类推。这种情况下第八层是最significant的,因为它可以表达出图片的大致轮廓,只需要通过这个图片就能大概知道原图的形状。
为了节省空间可以只取前几层的图片做叠加。
A Mathematical Model of Image Formation (Camera)
照相机可以视为对进入的光源做PSF变化再加噪音后返回一张图片
PSF使用convolution对图片进行操作
有多种不同的PSF构造方式
filtering
可以通过应用多种不同的滤镜来处理图片。
输入一个图片,经过处理(卷积)后输出处理后的图片,基于neighborhood
有linear和non-linear两种
linear是可以写出具体kernel的,但是两种都是shift invariant。
Linear filtering
Applying the linear filter to the image f is done by convolution: \[ g(x, y)=\{f * h\}(x, y)=\iint_{-\infty}^{\infty} f\left(x^{\prime}, y^{\prime}\right) h\left(x-x^{\prime}, y-y^{\prime}\right) d x^{\prime} d y^{\prime} \] In discrete filtering,we compute a new value for a traget pixel, we need to define a filter center for the target pixel.Normally it's the center if the window size is odd.
Some common linear filters
Mean filter
\[ \boldsymbol{h}_N=\frac{1}{N}[1, \cdots, 1] \]
边界处理
- Pad the border 用数字填充,通常是0.缺点是卷积后的图片边界比较暗
- Symmetric mirroring 使用另一边的镜像填充
- Periodic boundary 把图像看做周期性的
- Leave border pixels unchanged 直接复制边缘
- Filter only inside the image and crop away the border 只做能处理的边界,但是会让图片变小
Linear separable filters
A linear filter is linear separable, if it can be decomposed into a linear filter along the x-axis and another along the y-axis. \[ f * \boldsymbol{h}_{N \times N}=f * \frac{1}{N^2}\left[\begin{array}{ccc} 1 & \cdots & 1 \\ \vdots & \ddots & \vdots \\ 1 & \ldots & 1 \end{array}\right]=f * \frac{1}{N}[1, \cdots, 1] * \frac{1}{N}\left[\begin{array}{c} 1 \\ \vdots \\ 1 \end{array}\right] \] 即一个2D的kernel可以表示为两个1D的矩阵乘法的结果,可以让计算更快
Filtering multi-channel images
对RGB分别应用filter有可能会让颜色泄露出来,需要知道与对HSV的V应用结果不一样
Discrete Gaussian filter
由高斯方程得到的一个线性可分的filter,特点是中心权重大
参数有sigma和kernelsize
Derivative filter
对图片求导,由于图片是离散的所以取一个近似,即
\(\frac{\partial f}{\partial x} \approx f(x+1, y)-f(x, y)\)
\(\frac{\partial f}{\partial y} \approx f(x, y+1)-f(x, y)\)
对应的卷积核:
[1 -1]
\(\left[\begin{array}{c}1 \\ -1\end{array}\right]\)
还有多种其他的卷积核:
可用于边界检测
噪声对于求导卷积影响很大
Non-linear filtering
median filter
rank filter的一种特殊情况,每次先排序然后取中位数
Fourier Transform
Power spectrum 是由音波图经过傅里叶变换得来的,表达的是声音的frequency perspective
图片经过傅里叶变化也可以得到一个图,这个图显示了图片的频率,经过shift后中心的点是高频
图片经过傅里叶变化后可以通过遮盖特定部分来去除那一块频率的噪音,做法是给那一块加一个黑块。
高斯的傅里叶还是高斯
intensity thresholding and intensity histograms
Intensity Thresholding
Simple thresholding formula: \(J(x, y)= \begin{cases}1 & , \text { if } I(x, y)>\tau \\ 0 & , \text { Otherwise }\end{cases}\)
可以得到一个binary image
也可以在颜色通道上进行Thresholding
Intensity Histograms
Intensity histogram: Count the number of pixels having a specific intensity level
\(H_I(v)=\#\{(x, y), \quad I(x, y)=v\}, \quad v \in[0, L-1]\)
可以使用Histogram做thresholding,只保留某些intensity值
Histogram Matching
把一幅曝光不好的图像做Histogram Matching到另一副图像上可以调整亮度
Histogram equalization
可以让图像的histogram分布均匀
也可以应用在色彩通道
Mathematical morphology, distance transform, and connected components
Mathematical morphology
图像形态学,主要操作包括膨胀与腐蚀
Dilation: \(X \oplus B=\{p \in \Omega \mid p=x+\mathrm{b}, \mathrm{x} \in X\) and \(\mathrm{b} \in B\}\)
把b绕着X转一圈后扩大X的面积,可用于去除中心的空洞
Erosion: \(X \ominus B=\{p \in \Omega \mid p+\mathrm{b} \in X\) for all \(\mathrm{b} \in B\}\)
把b在\(\Omega\)遍历一遍,删掉所有不能覆盖的部分
Opening: \(X \ominus B \oplus B\)
Closing: \((X \oplus B) \ominus B\)
还可以用于边缘检测
参考链接:链接
Image Restoration and Deconvolution
Laplacian image sharpening
拉普拉斯算子在图像中变化明显的地方很容易检测到。
用原图减图片的二阶导可以实现image sharpening
但是缺点是如果原图有噪音这个操作会增加噪音,因为求导操作会增大噪音
Models of image degradation
Deconvolution without and with noise
图像形成过程:f(x,y) * h(x,y) + n(x,y) = g(x,y)
图像还原过程: g(x,y) use h(x,y) = \(\hat f(x,y)\)
Wiener filter可以用来还原带噪音的图像。
\[\frac{1}{H^{\prime}(u, v)}=\frac{1}{H(u, v)} \frac{|H(u, v)|^2}{|H(u, v)|^2+S_n(u, v) / S_f(u, v)}\]
Where \(S_n(u, v)=|N(u, v)|^2, S_f(u, v)=|F(u, v)|^2\) are the power spectra of the noise and the original image.
transformations
Motivation
Object shape can be a curve that represents the outline fo the object boundary
我们会获得各种图片,需要对图片进行一些变化以符合要求,如旋转,拉伸等
Transformations of points and intensities:
homogeneous coordinates
A 2D point \((x, y)\) has, for any \(w \neq 0\), a homogeneous coordinate representation \([w x, w y, w]^{\top}\).
Image warps (transformation of intensities)
有多种转换方式
Features
Intensity edges
Use local maxima of gradient magnitude to detecte edge
Canny edge detector: use thresholding and analysis of connectivity to detect
scalespace
Image Pyramids
Discrete scale
Linear scale space theory
Continouos scale
Multi-scale feature detection
segmentation
intensity based
使用threshold
或把图片拆分成多个高斯曲线的叠加
edge based
Region based
region growing
Energy minimization
minimize this formula: \[ E(f)=\sum_{p \in \mathcal{P}} D_p\left(f_p\right)+\sum_{p, q \in \mathcal{N}} V_{p, q}\left(f_p, f_q\right) \]
Chan-Vese
minimize this formula to be 0 \[ \begin{aligned} F_1(C)+F_2(C)= & \int_{\text {inside }(C)}\left|u_0(x, y)-c_1\right|^2 d x d y \\ & +\int_{\text {outside }(C)}\left|u_0(x, y)-c_2\right|^2 d x d y \end{aligned} \]
Hough transform
将原图中\(\rho_{\circ}=x \cdot \cos \theta_{\circ}+y \cdot \sin \theta_{\circ}\) 转化成关于\(\theta,\rho\) 的坐标系
Chose range and discretization of ρ and θ.
- θ in [0, 180] degrees
- ρ in [-d, d] where d is the length of the edge image’s diagonal.
Create accumulator 2D array (Hough Space) and initialize to zero.
For every edge pixel, loop through θ range, calculate corresponding ρ, and increment the accumulator at these coordinates.
Lines correspond to accumulator values larger than a certain threshold.
可以用来检测直线
Supervised machine learning for image segmentation
Texture
can be used to identify regions containing distinct textures
Use filter banks
Shape models and procrustes alignment
Procrustes alignment
- Translational alignment
- Scaling alignment
- Rotational alignment
Translational alignment: subtracting the sample mean
Scaling: S = sI \[ s=\frac{\sum_{i=1}^N \overrightarrow{x'}_i^T \overrightarrow{\mathbf{x}_i}}{\sum_{i=1}^N \overrightarrow{\mathbf{x}}_i^T \overrightarrow{\mathbf{x}_i}} \] 其中\(x'\)是target
rotation: \[ \mathbf{H}=\mathbf{V}^T \mathbf{R}=\mathbf{I} \quad \Rightarrow \quad \mathbf{R}=\mathbf{V} \mathbf{U}^T \] where \(\mathbf{U}\) and \(\mathbf{V}\) are obtained from the SVD of \(\mathbf{X Y} \mathbf{Y}^T\) - ie from \(\mathbf{X Y}^T=\mathbf{U S V}^T\).