Performance Optimal Vector Swizzling in C++

In a shader we can type vec3 v0 = v1.xxy * 2 and any other combination of x, y, z and w depending on the length of the vector. The resulting vector must not have the same same size (in the example v1 could be a vec2) because components can be copied through. This is called swizzling and is really comfortable.
Vectors are everywhere in game projects not only in the shaders. How can we get the same behavior in C++? Can we get it without losing performance? I wanted to understand if and how this can be done. There are two solutions available: The glm library from G-Truc and the CxxSwizzle library. Anyway, I did not test the two libraries for their performance but if you wanna have swizzling you might take one of them instead of the header file I had written. The advantage is that they have implemented more functions so far. But, I did not found explanations about how to solve the problem so I will try to fill that gap.

Before we can start here are the problems to face:

  • (1) Access the elements in arbitrary order and count: v0.xxy + v1.xzy
  • (2) Write to a swizzled vector v1.yxwz = v0;  where doubled elements are explicit forbidden
  • (3) No Memory overhead: a vec3 should have the size of 3 times its base type
  • (4) No Computational overhead: a solution with multiple lines containing equivalent scalar operations should not be faster

First there are two different possibilities to a achieve the syntax v1.yxwz = v0; without brackets: macros and unions. You could also have a nested type but then the expression would not return any address and it is impossible to calculate things on the data of v without its address. In case of macros you can hide functions like yxwz() which do something you want. The problems with functions is that they get complicated on the left-hand-side where we want them to return references to swizzlings. The example (2) should fill the vector v1 in a swizzled order and not compute things on some copy of v1. You might be able to solve that with template meta programming or explicit proxy objects. These are objects of another type containing a reference to the original type. Operators on them will always access the original elements in some type-dependent way. However Returning proxies might be to complicated for a compiler to be optimized away. Further I do not like to have macros like x to pollute all my namespaces!

The union Solution

In a union all members work on the same space. If each member has a different type and if there are operators for each we can do everything we want.

The types must be trivially copyable, otherwise it would not be possible to put them into a union. It is possible but not feasible to write so many types, so we want the compiler to make this job: using templates.

The Swizzle-Proxy Template

The above class shows the basic idea of how to implement the operators for swizzling with exactly two elements: xx, xy, wx, ... . The template arguments A and B can be any index of elements in an underling real vector. For the swizzle wx  A  is 3 and  B  is 0 accessing two elements of a vec4.

Notice: the class itself does not have own members! Instantiating it would cause lots of access violations. Together with the union above the this  pointer becomes a pointer to m_data . That is why we can cast it so ugly without fear.
Unfortunately when compiling the compiler must create a new operator for each combination of swizzle types. This increases compile times heavily which cannot be avoided.

So far we can use the class the following way:

The second line would also compile but behave wired. It would add v2.z and v2.x to v1.x successively. To avoid that we can cause the compiler to fail by the following trick:

Depending on how the indices are chosen the return type is either SwizzleProxy2 as before or struct OperationNotAvailable which is nowhere defined. In the second case the compiler cannot create the function and will give you an error message which will contain "OperationNotAvailable" at some point.

To implement all the different operators for all SwizzleProxyX class I tried to create a template based collection of common operator implementations. The problem was that the compiler failed to optimize everything so we need to do that ourself for each of the (four) proxy templates. So the old CommonVectorOperator class currently contains the array access operator [] only. To still reduce the work a little bit I used macros for code generation. The macro is undefined at the end of the operator section such that from outside there are no unnecessary symbols. Just have a look into the code of the complete SwizzleProxy2 class.

Remark: The scalar-vector operators are implemented as friend . This is a trick in C++ to avoid having such functions in the global namespace. The compiler can still find the function by ADL (argument dependent lookup). For each different template argument setup of the proxy class there is exactly one such operator.

You might have noticed that the template takes a VectorType argument. This is required in the implementation of the non-assigning operators as a simple +. These must return a new copy which is only possible of the real vector type is known.

The Final Vector Class

If the final class would not inherit from the proxy class operations on normal vectors would not succeed. Instead it would be necessary to write additional operators which take vector-swizzle, vecot-vector and swizzle-vector arguments but fortunately inheritance is much easier.

Then the union is filled with all access patterns up to vec4. As you can see these are 30 for a vec2. For a vec4 itself this number grows to 340 because there are four instead of two indices for each element.

Before the last constructor we would not be able to use all the nice swizzling stuff fluently. Calling move(position.zyx) would fail because .zyx is not a vector (assuming move would like to have a vector). The implicit cast generated through this constructor is rounding off the whole implementation.

Full Header: swizzle.7z
Currently the implementation lacks functions like normalization... They might follow later.

2 thoughts on “Performance Optimal Vector Swizzling in C++

  1. John Doe

    b.xy = a.xy won't work since the default assignment operator will be used instead of the templated one and since SwizzleProxy2 has no members, nothing will be copied and b stays unchanged.
    But that's easy to fix by implementing the operator by hand 🙂

    Reply
    1. Johannes Post author

      Thanks for that advice. I expected the default copy-assignment to be deleted. I did not now that this is not the case for templated operators.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *