Minor fixes, 8x8 blocks experiment, iccv camera ready

parent 3cf2c39b
......@@ -44,8 +44,8 @@ for f in range(15):
appx_relu = jpeg_layers.APXReLU(f).to(device)
for b in range(args.batches):
im = torch.rand(args.batch_size, 1, 4, 4).to(device) * 2 - 1
im = torch.einsum('ijuv,ncij->ncuv', [doubling_tensor, im])
im = torch.rand(args.batch_size, 1, 8, 8).to(device) * 2 - 1
#im = torch.einsum('ijuv,ncij->ncuv', [doubling_tensor, im])
im_jpeg = encode(im, device=device)
......
......@@ -293,4 +293,21 @@
author={Ulicny, Matej and Krylov, Vladimir A and Dahyot, Rozenn},
journal={arXiv preprint arXiv:1812.03205},
year={2018}
}
\ No newline at end of file
}
@article{G2018opt,
journal = {Journal of Open Source Software},
doi = {10.21105/joss.00753},
issn = {2475-9066},
number = {26},
publisher = {The Open Journal},
title = {opt\_einsum - A Python package for optimizing contraction order for einsum-like expressions},
url = {http://dx.doi.org/10.21105/joss.00753},
volume = {3},
author = {G. A. Smith, Daniel and Gray, Johnnie},
pages = {753},
date = {2018-06-29},
year = {2018},
month = {6},
day = {29},
}
......@@ -8,5 +8,5 @@
\maketitle
\begin{abstract}
We introduce a general method of performing Residual Network inference and learning in the JPEG transform domain that allows the network to consume compressed images as input. Our formulation leverages the linearity of the JPEG transform to redefine convolution and batch normalization with a tune-able numerical approximation for ReLu. The result is mathematically equivalent to the spatial domain network up to the ReLu approximation accuracy. A formulation for image classification and a model conversion algorithm for spatial domain networks are given as examples of the method. We show that the sparsity of the JPEG format allows for faster processing of images with little to no penalty in the network accuracy.
We introduce a general method of performing Residual Network inference and learning in the JPEG transform domain that allows the network to consume compressed images as input. Our formulation leverages the linearity of the JPEG transform to redefine convolution and batch normalization with a tune-able numerical approximation for ReLu. The result is mathematically equivalent to the spatial domain network up to the ReLu approximation accuracy. A formulation for image classification and a model conversion algorithm for spatial domain networks are given as examples of the method. We show skipping the costly decompression step allows for faster processing of images with little to no penalty in the network accuracy.
\end{abstract}
\ No newline at end of file
......@@ -18,7 +18,7 @@
% Pages are numbered in submission mode, and unnumbered in camera-ready
\ificcvfinal\pagestyle{empty}\fi
\setcounter{page}{1}
%\setcounter{page}{1}
\addbibresource{bibliography.bib}
......
......@@ -48,19 +48,19 @@ I'_{xyk} = J^{hw}_{xyk}I_{hw}
\end{equation}
We say that $I'$ is the representation of $I$ in the JPEG transform domain. The indices $h,w$ give pixel position, $x,y$ give block position, and $k$ gives the offset into the block.
The form of $J$ is constructed from the JPEG compression steps listed in the previous section. Let the linear map $B: H^* \otimes W^* \rightarrow X^* \otimes Y^* \otimes I^* \otimes J^*$ be defined as
The form of $J$ is constructed from the JPEG compression steps listed in the previous section. Let the linear map $B: H^* \otimes W^* \rightarrow X^* \otimes Y^* \otimes M^* \otimes N^*$ be defined as
\begin{equation}
B^{hw}_{xyij} = \left\{ \begin{array}{lr} 1 & \text{$h,w$ belongs in block $x,y$ at offset $i,j$} \\ 0 & \text{otherwise} \end{array} \right.
B^{hw}_{xymn} = \left\{ \begin{array}{lr} 1 & \text{$h,w$ belongs in block $x,y$ at offset $m,n$} \\ 0 & \text{otherwise} \end{array} \right.
\end{equation}
then $B$ can be used to break the image represented by $I$ into blocks of a given size such that the first two indices $x,y$ index the block position and the last two indices $i,j$ index the offset into the block.
then $B$ can be used to break the image represented by $I$ into blocks of a given size such that the first two indices $x,y$ index the block position and the last two indices $m,n$ index the offset into the block.
Next, let the linear map $D: I^* \otimes J^* \rightarrow A^* \otimes B^*$ be defined as
Next, let the linear map $D: M^* \otimes N^* \rightarrow A^* \otimes B^*$ be defined as
\begin{align}
D^{ij}_{\alpha\beta} = \frac{1}{4}A(\alpha)A(\beta)\cos\left(\frac{(2i+1)\alpha\pi}{16}\right)\cos\left(\frac{(2j+1)\beta\pi}{16}\right)
D^{mn}_{\alpha\beta} = \frac{1}{4}V(\alpha)V(\beta)\cos\left(\frac{(2m+1)\alpha\pi}{16}\right)\cos\left(\frac{(2n+1)\beta\pi}{16}\right)
\end{align}
then $D$ represents the 2D discrete forward (and inverse) DCT. Let $Z: A^* \otimes B^* \rightarrow \Gamma^*$ be defined as
where $V(u)$ is a normalizing scale factor. Then $D$ represents the 2D discrete forward (and inverse) DCT. Let $Z: A^* \otimes B^* \rightarrow \Gamma^*$ be defined as
\begin{equation}
Z^{\alpha\beta}_\gamma = \left\{ \begin{array}{lr} 1 & \text{$\alpha, \beta$ is at $\gamma$ under zigzag ordering} \\ 0 & \text{otherwise} \end{array} \right.
......@@ -75,7 +75,7 @@ where $q_k$ is a quantization coefficient. This scales the vector entries by the
With linear maps for each step of the JPEG transform, we can then create the $J$ tensor described at the beginning of this section
\begin{equation}
J^{hw}_{xyk} = B^{hw}_{xyij}D^{ij}_{\alpha\beta}Z^{\alpha\beta}_{\gamma}S^\gamma_k
J^{hw}_{xyk} = B^{hw}_{xymn}D^{mn}_{\alpha\beta}Z^{\alpha\beta}_{\gamma}S^\gamma_k
\end{equation}
The inverse mapping also exists as a tensor $\widetilde{J}$ which can be defined using the same linear maps with the exception of $S$. Let $\widetilde{S}$ be
......@@ -86,9 +86,8 @@ The inverse mapping also exists as a tensor $\widetilde{J}$ which can be defined
Then
\begin{equation}
\widetilde{J}^{xyk}_{hw} = B_{hw}^{xyij}D_{ij}^{\alpha\beta}Z_{\alpha\beta}^{\gamma}\widetilde{S}^k_\gamma
\widetilde{J}^{xyk}_{hw} = B_{hw}^{xymn}D_{mn}^{\alpha\beta}Z_{\alpha\beta}^{\gamma}\widetilde{S}^k_\gamma
\end{equation}
noting that, for all tensors other than $\widetilde{S}$, we have freely raised and lowered indices without the use of a metric tensor since we consider only the standard orthonormal basis, as stated earlier.
Next consider a linear map $C: H^* \otimes W^* \rightarrow H^* \otimes W^*$ which performs an arbitrary pixel manipulation on an image plane $I$. To apply this mapping to a JPEG image $I'$, we first decompress the image, apply $C$ to the result, then compress that result to get the final JPEG. Since compressing is an application of $J$ and decompressing is an application of $\widetilde{J}$, we can form a new linear map $\Xi: X^* \otimes Y^* \otimes K^* \rightarrow X^* \otimes Y^* \otimes K^*$ as
......
......@@ -2,4 +2,4 @@
In this work we showed how to formulate deep residual learning in the JPEG transform domain, and that it provides a notable performance benefit in terms of processing time per image. Our method expresses convolutions as linear maps \cite{smith1994fast} and introduces a novel approximation technique for ReLu. We showed that the approximation can achieve highly performant results with little impact on classification accuracy.
Future work should focus on two main points. The first is efficiency of representation. Our linear maps take up more space than spatial domain convolutions. This makes it hard to scale the networks to datasets with large image sizes. Secondly, library support in commodity deep learning libraries for some of the features required by this algorithm are lacking. As of this writing, true sparse tensor support is missing in all of PyTorch \cite{paszke2017automatic}, TensorFlow \cite{tensorflow2015-whitepaper}, and Caffe \cite{jia2014caffe}, with these tensors being represented as coordinate lists which are known to be highly non-performant. Additionally, the \texttt{einsum} function for evaluating multilinear expressions is not fully optimized in these libraries when compared to the speed of convolutions in libraries like CuDNN \cite{chetlur2014cudnn}.
\ No newline at end of file
Future work should focus on two main points. The first is efficiency of representation. Our linear maps take up more space than spatial domain convolutions. This makes it hard to scale the networks to datasets with large image sizes. Secondly, library support in commodity deep learning libraries for some of the features required by this algorithm are lacking. As of this writing, true sparse tensor support is missing in all of PyTorch \cite{paszke2017automatic}, TensorFlow \cite{tensorflow2015-whitepaper}, and Caffe \cite{jia2014caffe}, with these tensors being represented as coordinate lists which are known to be highly non-performant. Additionally, the \texttt{einsum} function for evaluating multilinear expressions is not fully optimized in these libraries when compared to the speed of convolutions in libraries like CuDNN \cite{chetlur2014cudnn}, though we make use of the \texttt{opt\_einsum} \cite{G2018opt} tool to partially mitigate this.
\ No newline at end of file
......@@ -43,7 +43,7 @@ Next, we examine the impact of the ReLu approximation. We start by examining the
\captionsetup{width=.8\linewidth}
\centering
\includegraphics[width=\textwidth]{plots/relu_blocks.eps}
\caption{ReLu blocks error. Our ASM method consistently gives lower error than the naive approximation method.}
\caption{ReLu blocks error. Our ASM method consistently gives lower error than the naive approximation method. }
\label{fig:rba}
\end{subfigure}%
\begin{subfigure}{0.33\textwidth}
......
......@@ -13,4 +13,4 @@ The contributions of this work are as follows
\item A model conversion algorithm to apply pretrained spatial domain networks to JPEG images
\item Approximated Spatial Masking: the first general technique for application of piecewise linear functions in the transform domain
\end{enumerate}
By skipping the decompression step and by operating on the sparser compressed format, we show a notable increase in speed for training and inference.
\ No newline at end of file
By skipping the decompression step and by operating on the sparser compressed format, we show a notable increase in speed for testing and a marginal speed for training.
\ No newline at end of file
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment