## piątek, 9 stycznia 2015

### A Better Way To Use Lerp To Smooth Things Out

In video games, or other areas that involve some movement and interactive actions, there is often a need to smooth out movement of some object. As a video game programmer we often stumble across an article or real-life code that tackles the problem in a way that is not satisfactory enough for us. Many people misuse linear interpolation (lerp) call for this purpose. Lerp is defined as follows: $$lerp(A, B, t) = A + (B-A)t.$$ The problem goes like this: we want to move an object from current position to $X$ smoothly. What some people do, e.g. Unity developers in their Mecanim Animation Tutorial is they call the following code every frame: $$\label{wrong} newPosition = lerp(oldPosition, X, a \cdot \Delta t),~~~~~~~~~~~~~~~~(1)$$ where $\Delta t$ is the time last frame lasted measured in seconds and $a$ is a constant that state how fast the transition to $X$ should occur.

Note that given scenario allows $X$ to be changed while the time progresses, so that the object follows the $X$ position. Also, even though the code uses lerp function call, the movement is not linear in any sense of that word.

The thing that caught our attention and made us wonder is that interpolation parameter (the last argument of lerp function) can sometimes be greater than one, e.g. when $\Delta t$ is big because last frame lasted a long time for some reasons. We don't want that.

Given solution is wrong because the movement coded in such way is not independent from the frame rate. To see it clearly, let us rewrite equation $1$ by parametrizing it with $\Delta t$ and make it recursive sequence of positions at $n$-th frame. $$\label{wrong2} P^{\Delta t}_n = lerp(P_{n-1}, X, a \cdot \Delta t)~~~~~~~~~~~~~~~(2)$$ We assume for a moment for simplicity that $\Delta t$ and $X$ do not change during execution of the code. For our purposes, we can also assume that $P^{\Delta t}_0=0$ and compute the explicit form of equation $2$ (using our math skills or Wolfram Alpha). $$P^{\Delta t}_n = X(1 - (1-a\cdot\Delta t)^n)$$ Now let us consider two special cases. Let $a=1$. One were code runs at $60$ FPS for $60$ frames and second one that runs at $10$ FPS for $10$ frames. $$P^{1/60}_{60} = X(1 - (1-\frac{1}{60})^{60}) \approx 0.6352 X$$ $$P^{1/10}_{10} = X(1 - (1-\frac{1}{10})^{10}) \approx 0.6513 X$$ Both cases last one second and it would be desirable if those values were the same. Thus, by example above, we see that this method is not frame rate independent.

Generally speaking, we want the following equality to be met $$\label{req} P^{\Delta t}_{n} = P^{\Delta t \cdot 1/\beta}_{n \cdot \beta}~~~~~~~~~~~~~~~(3)$$ for any value of $\beta$.

Once we have described and analysed the problem, we can try to come up with a better solution. We propose to use the following code: $$newPosition = lerp(oldPosition, X, 1 - e^{-a \cdot \Delta t})$$ where $e$ is the base of the natural logarithm and $a$ is a constant that denote how fast we want to interpolate the movement. Rewriting to a more concise form as we did before, we get $$P^{\Delta t}_n = lerp(P_{n-1}, X, 1 - e^{-a \cdot \Delta t}).$$ Assuming initial conditions $P^{\Delta t}_0=0$, the explicit version can be computed (we used some math software again). $$\label{good} P^{\Delta t}_n = X (1 - e^{-a\cdot\Delta t\cdot n})~~~~~~~~~~~~~~~(4)$$

Equation $4$ meets the requirement given by equation $3$: $$P^{\Delta t \cdot 1/\beta}_{n \cdot \beta} = X(1 - e^{-a\cdot\frac{1}{\beta}\cdot\Delta t \cdot n \cdot \beta}) = X (1 - e^{-a\cdot\Delta t\cdot n}) = P^{\Delta t}_n$$ and therefore is independent from the frame rate.

To sum up, we propose to use the $1-e^{-a\cdot\Delta t}$ instead of $a\cdot\Delta t$ as the interpolation factor. It makes much more sense and the movement becomes more predictable when the frame rate varies.

## poniedziałek, 15 września 2014

### Finding Two Max Values in an Array in Shader

One of the problems I was faced was to, given four floats, find the two highest values and their indices in the array. This had to be done in a shader program so ideally no loops, arrays indexing or things like that. Fortunately my sort-of-a-boss, KriS, quickly found a solution that I'm presenting here.

Here comes the code:
void GetTwoMaxValues( in float4 values, out float2 weights, out float2 indices )
{
float v1 = max( max( max( values.x, values.y ), values.z ), values.w );
float i1 = 0;
i1 = CndMask( values.y == v1, 1, i1 );
i1 = CndMask( values.z == v1, 2, i1 );
i1 = CndMask( values.w == v1, 3, i1 );

//

values.x = CndMask( i1 == 0, 0, values.x );
values.y = CndMask( i1 == 1, 0, values.y );
values.z = CndMask( i1 == 2, 0, values.z );
values.w = CndMask( i1 == 3, 0, values.w );

float v2 = max( max( max( values.x, values.y ), values.z ), values.w );
float i2 = 0;
i2 = CndMask( values.y == v2, 1, i2 );
i2 = CndMask( values.z == v2, 2, i2 );
i2 = CndMask( values.w == v2, 3, i2 );

//

weights.x = v1;
weights.y = v2;
indices.x = i1;
indices.y = i2;
}
The code finds the first maximum value simply by "iterating" over elements and "max'ing" them. That was simple. But we also need to know the index of the value in the array. We assume it's the 0th index and compare the found value with other values from the array using CndMask statement. If the compared value is equal to the value of the element in the array, we update the index.

To find the second highest values we first zeroe the actual highest element found in the previous pass and do the same procedure as in the previous pass to find the second highest element.

The code can actually be understood much quicker by actually reading it rather than reading explanation trying to explain it :). It doesn't do anything magical in algorithmic terms but shows how to solve (efficiently) this simple problem of two max values in a shader program.

## środa, 30 lipca 2014

### Why Are Kids Happy?

Yes. This blog post is way different from what you (like three or four of you?) have accustomed to see here but I just have the need to share some of my life philosophies/thoughts, not just math/game/graphics-related stuff :).

So, the subject of this post goes "why are kids happy?". Today I loosely pondered a bit on that with my flatmates (who by the way are some of my favorites and closest friends :)). For whatever it may sound, it started with Moomins and Smurfs... Think of these books or TV series or whatever form they assumed. You might have forgotten them but you remember them. Those hippo-shaped humanoids and blue gnoms. What do you think of them? Do you think how high their creators must have been at the time they made their concepts arts? What stuff did these guys inhale to make a story of humanoidal hippos with occasional humans appearing in form of an always-wandering Snufkin or girl/lady(?) called Little My? Not to mention the blue Smurf creatures who were almost always males, with one Smurfette lady who probably had to satisfy all of the other male Smurfs sexual needs. Are any of these or something similar on your mind right now? If yes, that is not a big deal. That's just what adults do. They analyze a whole lot. And then more and more thinking of how the worlds created in Smurfs or Moomins could exist in "real-life". And that's the problem. Kids don't do that. They don't analyze. They don't wonder on whether a village full of male blue gnoms and a single lady makes sense and what they do when lights go off. They just experience this world as it is. Sure, it's easy to manipulate kids this way because they are so trustful, but try from time to time to observe a child or a kid in a casual situation and see what fun he or she has from experiencing it, indulging in the moment.

Kids just happen to live for the moment more often than adults, who spent more time on living somewhere else, whether it be the past or future. I want a new car, a new house, a good-looking partner, a thriving business... Don't you spend too much time on thinking (and worrying!) on all those things? What was the last time you walked barefoot on wet grass? Or just did something that did not quite fit to your daily routine? So next time you catch yourself on thinking what your life is going to be in a year from now, take a bike and go on a trip outside of city. There is so many beautiful green areas around! See them. Feel them. Experience them. Like this little kid who indeed does this, even though it might happen for him/her on a more subconscious than conscious level :).

## środa, 9 kwietnia 2014

### Tangent Basis in Compact Form

Recently at Flying Wild Hog my task was to change the vertex format of meshes. Both static and skinned meshes used 12 bytes for storing normal, tangent, bitangent and color of the vertex. This looked something like this:
UByte4N normal;
UByte4N tangent;
UByte4N bitangent;
Each tangent basis vector is stored explicitly on 24 bits and the last 8 bits of each vector are intended to store one component of the vertex color (red stored in normal's alpha, green in tangent's alpha and blue in bitangent's alpha).

The UByte4N type internally stores one 32-bit unsigned integer (hence the U prefix) and has some helper functions for packing and unpacking four 8-bit values. Obviously, as tangent basis vector's components are floats in normalized $[-1, 1]$ interval (hence the N postfix meaning normalized), there are also functions which convert it to decimal $[0, 255]$ interval.

This form of the tangent basis with color packed together is actually quite nice. The precision is not bad and all channels are fully utilized. The biggest problem is that in order to get color we must read three vectors. We can do better though.

The first thing is to get rid off the bitangent vector altogether. The whole tangent basis can nicely be packed with just normal, tangent and one value that indicates the handedness of the entire basis. To get the handedness of the basis you can compute the determinant of a matrix that is formed of that basis - it will be either $1$ or $-1$ for an orthonormal basis. This way we save two values (one handedness value as opposed to three defining the bitangent vector) and we can extract the color to a separate vector:
UByte4N normal; // .w unused
UByte4N tangent; // .w stores handedness
UByte4N color;
It looks like now the color has been enriched with the alpha channel. Also, the $w$ component of the normal vector is unused. And hey, do we really need to waste 8 bits for the handedness? One value would suffice. Fortunately, GPUs are equipped with RGB10A2 format, where 10 bits are devoted to RGB channels and 2 bits for the alpha channel. This way we can store the tangent vector with greater precision. So, our new tangent basis and color part of the vertex structure is this:
UX10Y10Z10W2N normal; // .w unused
UX10Y10Z10W2N tangent; // .w stores handedness
UByte4N color;
Now this one looks scary at first glance. The U prefix again means that the stored values are unsigned - decimal $[0, 1023]$ interval for the $x$, $y$, and $z$ components and $[0, 3]$ for the $w$ component. Also, the N postfix means the that input float values to the structure must be normalized, that is in $[-1, 1]$ interval.

Finally, some piece of working code:
struct UX10Y10Z10W2N
{
UInt32 m_value;

void Set( const float* xyz, int w )
{
m_value = 0;

for ( int i = 0; i < 3; i++ )
{
float f = ( xyz[ i ] + 1.0f ) * 511.5f;

if ( f < 0.0f )
f = 0.0f;
else if ( f > 1023.0f )
f = 1023.0f;

m_value |= ( ( int )f ) << 10*i;
}

m_value |= w << 30;
}

void Get( float& x, float& y, float& z ) const
{
x = ( float )( ( m_value >> 0  ) & 1023 );
y = ( float )( ( m_value >> 10 ) & 1023 );
z = ( float )( ( m_value >> 20 ) & 1023 );

x = ( x - 511.0f ) / 512.0f;
y = ( y - 511.0f ) / 512.0f;
z = ( z - 511.0f ) / 512.0f;
}

void Get( float& x, float& y, float& z, int& w ) const
{
Get( x, y, z );
w = ( ( m_value >> 30 ) & 3 );
}
};
I'm leaving you without much explanation as it should be quite easy to read. One thing worth noting is that the $w$ component is not passed or read as float but rather as int. That is because I determined it does not make much sense dealing with float here as we have barely 2 bits. I pass here $0$ or $3$ (which will be converted to $1$ in the shader) depending on the handedness of the tangent basis.

## sobota, 22 lutego 2014

### Edge Detection from Depths and Normals

Edge detection is an algorithm that comes handy from time to time in every graphics programmer's life. There are various approaches to this problem and various tasks that we need edge detection for. For instance, we might use edge detection to perform anti-aliasing (like in here http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter09.html) or to draw some stylish outlines. So how can we detect the edges? Some algortihms compare color differences between the pixel we are currently processing and its neighbours (like antialiasing algorithms FXAA and MLAA). Others, like the one I linked a few words before, use the rendered scene's geometry information, like the depth-buffer and normals, to do so. In this post I present a variant of this algorithm.

The algorithm I use needs three pixels. The one we're processing and some two "different" neighbours. You can take the one to the left along with the one to the top for instance. Now, we read normal vectors (either in view or world space) at those pixels and depths. We also recover view or world space positions from the sampled depths (I showed how to do this here http://maxestws.blogspot.com/2013/11/recovering-camera-position-from-depth.html).

So, to sum up at this point, we have three (I assume view space) normals ($N$, $N_{left}$ and $N_{top}$) and positions ($P$, $P_{left}$ and $P_{top}$). Here comes the shader code:
bool IsEdge(
float3 centerNormal,
float3 leftNormal,
float3 topNormal,
float3 centerPosition,
float3 leftPosition,
float3 topPosition,
float normalsDotThreshold,
float distanceThreshold)
{
// normals dot criterium

float centerLeftNormalsDot = dot(centerNormal, leftNormal);
float centerTopNormalsDot = dot(centerNormal, topNormal);

if (centerLeftNormalsDot < normalsDotThreshold ||
centerTopNormalsDot < normalsDotThreshold)
{
return true;
}

// distances difference criterium

float3 v1 = leftPosition - centerPosition;
float3 v2 = topPosition - centerPosition;

if (abs(dot(centerNormal, v1)) > distanceThreshold ||
abs(dot(centerNormal, v2)) > distanceThreshold)
{
return true;
}

//

return false;
}
Good starting value for normalsDotThreshold is $0.98$ and for distanceThreshold it's $0.01$.

The code uses two criterions to determine whether we are on an edge. The first checks dots of the normals. It's purpose is to detect edges that appear on the continous surface but with varying normals, where, for instance, two perpendicular planes meet (think of a side of a box standing on a floor).

The second criterium checks for distances. Imagine two planes parallel to each other but at different heights. When viewed in such a way that an edge of the top plane appears in front of the bottom plane we clearly have, well, an edge to detect. The normals dot product won't work because normals of all pixels here are the same (the planes are parallel). So in this case we need to track vectors from the center pixel to the neighbouring pixels. If they are too far away then we have an edge. Note here that we actually dot those vectors with the center pixel's normal. This is to avoid false positive cases for pixels who are on the same plane but are far away from each other. In that case, obviously, we don't care they are far away. They lie on the same plane so there is no edge.

I know that a picture is worth more than a thousand words. That's why I preferred to described the code with words instead of putting images here. Wait... what? That's right. Your's best bet is to just take this simple piece of code and try it in practice to see exactly how it works. I highly recommend doing the following modifications and see what happens:
1. Leave just the normals dot criterium.
2. Leave just the distances difference criterium.
3. Normalize vectors $v_1$ and $v_2$ in the distances difference criterium.

Have fun :).

### Recovering Camera Position from Depth

Memory bandwidth is a precious resource on the current generation GPU hardware. So precious that it is often the case when storing less data and using packing/unpacking algorithms can be much faster than storing fat data in unpacked form. The most common example of this issue that comes to my mind are deferred shading/lighting algorithms.

In deferred shading we usually need to store, in some offscreen buffer, positions of pixels that we are going to light. The simplest form is to use the coordinates of positions directly, that is, three floats per pixel. That forces us to use either RGBA32F format or, if we can get away with lower precision, RGBA16F format. In either case that is a lot of memory.

Can we do better? Well, yes. A common solution to this problem is to use, for instance, the depth buffer and recover pixels' positions from it. We should be able to do that, right? If we could get from camera space to projection space and store values in the depth buffer, then we can do the reverse. To do that we just need to take the value from the depth buffer and find $x$ and $y$ normalized device coordinates of the pixel we're processing (those are fairly easy to find). Once we have those values we just need to multiply a vector composed of these values (plugging $w=1$) and multiply it by the inverse projection matrix. That would give us position of the pixel in camera space.

So are we done? Technically, yes. There are some flaws to this approach however. The first one is that the depth buffer is not always available on all hardware (mobile devices for instance). That is not so much of a problem as we can generate a "custom" depth buffer that will most likely also be used for other effects, like soft particles and such. The second one is performance. The vector-matrix multiplication must take place in the pixel shader where every single instruction counts. The sole vector-matrix mul is four dot products. We might also need to normalize the output vector so that its $w=1$, which is one division.

There is fortunately a better way to recover the camera space position from depth the will take barely one mul in the pixel shader! Let's have a look at the following picture:

The picture shows a side view of a camera's frustum. Point $(y, z)$ lies on the near plane. Our task is to find $y'$ of some point of interest. The value of $z'$ is known - it is just the linear depth value in camera space of that point. We can get it either by converting the normalized device coordinate (NDC) $z$ from the depth buffer to linear space (a word on this at the end of the post) or we can utilize the separate depth buffer we mentioned above. This separate buffer (which we will have to use when we don't have access to the hardware depth buffer) does not have to store the NDC $z$ values but rather linear camera space $z$ values. This is actually even more desirable for many practical applications.

So, we know $z'$. We want to find $y'$ but first we also need to find $y$ and $z$ values. This couple represents the coordinates of our point of interest on the camera's near plane. Given point's position in NDC we can calculate the coordinates on the near plane like this (in the vertex shader)
  vec2 position_NDC_normalized = 0.5 * position_NDC.xy / position_NDC.w;
varying_positionOnNearPlane = vec3(nearPlaneSize.xy * position_NDC_normalized, zNear);

First, we scale the NDC position so that all coordinates are in $[-0.5, 0.5]$ range. Then we multiply those coordinates by the size (width and height) of the near plane and plug the camera's distance to the near plane zNear to the $z$ coordinate. This resulting vector (varying_positionOnNearPlane) can now be interpolated and read in the pixel shader.

As we now know $y$, $z$ and $z'$, evaluating $y'$ is easy - we just employ similiar triangles:
\begin{eqnarray}
y' = \frac{y * z'}{z}
\end{eqnarray}
  float linearDepth = texture2DProj(gbufferLinearDepthTexture, varying_screenTexCoord).x;
vec3 position_view = varying_positionOnNearPlane * linearDepth / varying_positionOnNearPlane.z;

We're almost there. I promised there would be only one mul but at the moment there is one mul and one div. It might not be so obvious at first glance but the $z$ coordinate of varying_positionOnNearPlane is constant for all points - it's just the distance from the camera to the near plane so this division can be moved to the vertex shader.

And that's it. With a few simple observations and math "tricks" we managed to write an optimal pixel shader for camera space position reconstruction from linear depth.

One last thing that remains untold is how to recover linear depth from the hardware depth buffer. As we know, to get from camera space to projection (NDC) space we use some sort of projection matrix. We can use this fact to derive a simple formula that only transforms the $z$ coordinate from projection space to camera space. So let's say we have a vector $[x, y, z, 1]$ defining a point in camera space and we multiply it by a projection matrix (matrix taken after http://www.opengl.org/sdk/docs/man2/xhtml/gluPerspective.xml; note that we use row-major order and collapsed the entries' formulas) to get the vector in the projection space:
$\begin{bmatrix} x & y & z & 1 \end{bmatrix} \begin{bmatrix} A & 0 & 0 & 0 \cr 0 & B & 0 & 0 \cr 0 & 0 & C & -1 \cr 0 & 0 & D & 0 \end{bmatrix} = \begin{bmatrix} xA & yB & zC + D & -z \end{bmatrix} \sim \begin{bmatrix} \frac{xA}{-z} & \frac{yB}{-z} & -(C + \frac{D}{z}) & 1 \end{bmatrix}$
The term that goes into the depth buffer is $-(C + \frac{D}{z})$, where $z$ is the linear depth. Assuming the value stored in the depth buffer that we read is $z'$, we just need to solve the following equation for $z$:
\begin{eqnarray}
-(C + \frac{D}{z}) = z' \cr
z = \frac{-D}{z' + C}
\end{eqnarray}
The constants $C$ and $D$ can be extracted from the projection matrix and passed to the pixel shader.

DEDYKACJA: tę notkę dedykuję Karolinie P., która w chwili obecnej nieco zmaga się z algebrą abstrakcyjną oraz programowaniem w C++, jednak jestem pewien, że jej upór i charakter pozwolą jej przebrnąć przez zarówno te zajęcia jak i cały semestr :)

## środa, 3 października 2012

### Fast Texture Projection onto Geometry on the GPU

While implementing various graphical algorithms, a problem that must be solved from time to time is projection of texture onto some geometry. This happens, for instance, when we want to make some water reflection or cast g-buffer textures onto light's geometry.

The procedure is pretty straightforward. If we multiply a vertex's position by the whole world-view-projection matrix and perform the perspective division, we get vertex's position in normalized device coordinates, which all are in $[-1, 1]$ interval. If we wanted to project a texture onto geometry, we would need to do exactly the same thing (but use world-view-projection matrix of some sort of "projector", not the game's "camera") and remap the vertex's final $x$ and $y$ coordinates from $[-1, 1]$ interval to $[0, 1]$ interval, thus successfully generating texture coordinates, which can be used to sample a texture.

Let's say that we have a vertex shader like this:
uniform mat4 worldViewProjTransform;
uniform mat4 projectorWorldViewProjTransform;

attribute vec4 attrib_position;

varying vec4 varying_projectorTexCoord;

void main()
{
gl_Position = attrib_position * worldViewProjTransform;

varying_projectorTexCoord = attrib_position * projectorWorldViewProjTransform;
}

In this vertex shader we simply multiply the vertex's position by the projector's matrix. Perspective division and coordinates remap does not occur here yet. It will in the fragment shader:
uniform sampler2D projectorSampler;

varying vec4 varying_projectorTexCoord;

void main()
{
vec4 projectorTexCoord = varying_projectorTexCoord;

projectorTexCoord /= projectorTexCoord.w;
projectorTexCoord.xy = 0.5 * projectorTexCoord.xy + 0.5;

gl_FragColor = texture2D(projectorSampler, projectorTexCoord.xy);
}

In this pixel shader we first do the perspective division and then, when we know we are in $[-1, 1]$ interval, we can do the remapping part.

The title of this post contains word "fast". Is our approach fast? Well, with today's hardware it actually is but we can do better. The very first thing that should capture your attention is the division in the pixel shader. Can we do something about it? Yup. There is a function in basically any real-time shading language like texture2DProj which, before taking a sample from the texture, will first divide all components of the second argument by the 4th component, $w$. So, we can try doing this:
uniform sampler2D projectorSampler;

varying vec4 varying_projectorTexCoord;

void main()
{
vec4 projectorTexCoord = varying_projectorTexCoord;

projectorTexCoord.xy = 0.5 * projectorTexCoord.xy + 0.5;

gl_FragColor = texture2DProj(projectorSampler, projectorTexCoord);
}

Will this work? Of course not. That is because we switched the order of operations. In this version of code the projection happens after remapping whereas it should occur before. So, we need to fix the remapping code.

Remember that the projected coordinates $[x, y, z, w]$ that come into the pixel shader are all in $[-w, w]$ interval. This means that we want to remap the coordinates from $[-w, w]$ to $[0, w]$. Then, the divison can be performed either by dividing manually and calling texture2D or by calling texture2DProj directly. So, our new pixel shader can look like this:
uniform sampler2D projectorSampler;

varying vec4 varying_projectorTexCoord;

void main()
{
vec4 projectorTexCoord = varying_projectorTexCoord;

projectorTexCoord.xy = 0.5 * projectorTexCoord.xy + 0.5 * projectorTexCoord.w;

gl_FragColor = texture2DProj(projectorSampler, projectorTexCoord);
}

Are we done yet? Well, there is actually one more thing we can do. We can shift the "$0.5$" multiplication and addition to the vertex shader. So the final vertex shader is:
uniform mat4 worldViewProjTransform;
uniform mat4 projectorWorldViewProjTransform;

attribute vec4 attrib_position;

varying vec4 varying_projectorTexCoord;

void main()
{
gl_Position = attrib_position * worldViewProjTransform;

varying_projectorTexCoord = attrib_position * projectorWorldViewProjTransform;
varying_projectorTexCoord.xy = 0.5*varying_projectorTexCoord.xy + 0.5*varying_projectorTexCoord.w;
}

and the final pixel shader is:
uniform sampler2D projectorSampler;

varying vec4 varying_projectorTexCoord;

void main()
{
vec4 projectorTexCoord = varying_projectorTexCoord;

gl_FragColor = texture2DProj(projectorSampler, projectorTexCoord);
}

Yes. Now we are done.

One might wonder if we could do the perspective division in the vertex shader and call texture2D instead of texture2DProj in the pixel shader. The answer is: we can't. The hardware, during rasterization, interpolates all vertex-to-pixel shader varyings. Assuming that varying_projectorTexCoord is $[x, y, z, w]$ vector, the value that is read in the pixel shader is $[\mbox{int}(x), \mbox{int}(y), \mbox{int}(z), \mbox{int}(w)]$, where $\mbox{int}$ is an interpolating function. If we performed the division in the vertex shader, the vector coming to the pixel shader would be of the form $[\mbox{int}(\frac{x}{w}), \mbox{int}(\frac{y}{w}), \mbox{int}(\frac{z}{w}), 1]$. And obviously $\mbox{int}(\frac{x}{w}) \neq \frac{\mbox{int}(x)}{\mbox{int}(w)}$ (same for other components). Note that the right hand side of this inequality is what happens when we do the perspective division in the pixel shader

Instead of doing all this trickery we could include a special remapping matrix into the chain world-view-projector of the projector's matrix. What this matrix looks like can be found here (this is also a great tutorial about texture projection in general by the way). The problem with this approach is that you might not always have this matrix injected into the whole chain and for some reason you either don't want or even can't do it. You could, naturally, create this remapping matrix in the vertex shader but that would raise the number of instructions. It is better to use the approach discussed throughout this post in that case, as it will make things running much quicker.