In a shader we can type
vec3 v0 = v1.xxy * 2 and any other combination of x, y, z and w depending on the length of the vector. The resulting vector must not have the same same size (in the example v1 could be a vec2) because components can be copied through. This is called swizzling and is really comfortable.
Vectors are everywhere in game projects not only in the shaders. How can we get the same behavior in C++? Can we get it without losing performance? I wanted to understand if and how this can be done. There are two solutions available: The glm library from G-Truc and the CxxSwizzle library. Anyway, I did not test the two libraries for their performance but if you wanna have swizzling you might take one of them instead of the header file I had written. The advantage is that they have implemented more functions so far. But, I did not found explanations about how to solve the problem so I will try to fill that gap.
Before we can start here are the problems to face:
- (1) Access the elements in arbitrary order and count: v0.xxy + v1.xzy
- (2) Write to a swizzled vector v1.yxwz = v0; where doubled elements are explicit forbidden
- (3) No Memory overhead: a vec3 should have the size of 3 times its base type
- (4) No Computational overhead: a solution with multiple lines containing equivalent scalar operations should not be faster
First there are two different possibilities to a achieve the syntax v1.yxwz = v0; without brackets: macros and unions. You could also have a nested type but then the expression would not return any address and it is impossible to calculate things on the data of v without its address. In case of macros you can hide functions like yxwz() which do something you want. The problems with functions is that they get complicated on the left-hand-side where we want them to return references to swizzlings. The example (2) should fill the vector v1 in a swizzled order and not compute things on some copy of v1. You might be able to solve that with template meta programming or explicit proxy objects. These are objects of another type containing a reference to the original type. Operators on them will always access the original elements in some type-dependent way. However Returning proxies might be to complicated for a compiler to be optimized away. Further I do not like to have macros like x to pollute all my namespaces!
The union Solution
In a union all members work on the same space. If each member has a different type and if there are operators for each we can do everything we want.
1 2 3 4 5 6 7 |
union { float m_data[3]; Txxx xxx; Txxy xxy; ... }; |
The types must be trivially copyable, otherwise it would not be possible to put them into a union. It is possible but not feasible to write so many types, so we want the compiler to make this job: using templates.
The Swizzle-Proxy Template
1 2 3 4 5 6 7 8 9 10 11 12 |
template<typename VectorType, typename Data, int A, int B> class SwizzleProxy2 { public: template<class VectorType2, typename Data2, int A2, int B2> SwizzleProxy2& operator += (const SwizzleProxy2<VectorType2, Data2, A2, B2>& _rhs) { ((Data*)this)[A] += ((const Data2*)&_rhs)[A2]; ((Data*)this)[B] += ((const Data2*)&_rhs)[B2]; return *this; } }; |
The above class shows the basic idea of how to implement the operators for swizzling with exactly two elements: xx, xy, wx, ... . The template arguments A and B can be any index of elements in an underling real vector. For the swizzle wx A is 3 and B is 0 accessing two elements of a vec4.
Notice: the class itself does not have own members! Instantiating it would cause lots of access violations. Together with the union above the
this pointer becomes a pointer to
m_data . That is why we can cast it so ugly without fear.
Unfortunately when compiling the compiler must create a new operator for each combination of swizzle types. This increases compile times heavily which cannot be avoided.
So far we can use the class the following way:
1 2 |
v1.xw += v2.yy; v1.xx += v2.zx; // Bad O_o |
The second line would also compile but behave wired. It would add v2.z and v2.x to v1.x successively. To avoid that we can cause the compiler to fail by the following trick:
1 2 3 4 |
static const bool IsWritable = (A != B); typedef typename std::conditional<IsWritable, SwizzleProxy2, struct OperationNotAvailable>::type WriteableThisType; WriteableThisType& operator += (const SwizzleProxy2<VectorType2, Data2, A2, B2>& _rhs) |
Depending on how the indices are chosen the return type is either SwizzleProxy2 as before or struct OperationNotAvailable which is nowhere defined. In the second case the compiler cannot create the function and will give you an error message which will contain "OperationNotAvailable" at some point.
To implement all the different operators for all SwizzleProxyX class I tried to create a template based collection of common operator implementations. The problem was that the compiler failed to optimize everything so we need to do that ourself for each of the (four) proxy templates. So the old CommonVectorOperator class currently contains the array access operator [] only. To still reduce the work a little bit I used macros for code generation. The macro is undefined at the end of the operator section such that from outside there are no unnecessary symbols. Just have a look into the code of the complete SwizzleProxy2 class.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
/// \brief Type for swizzled access to two element vectors. /// \details All the vector operators are defined on the swizzle types. The /// derived final vector classes are only unions of swizzle vectors. template<typename VectorType, typename Data, int A, int B> class SwizzleProxy2: public CommonVectorOperators<Data> { public: /// \brief To be a write able proxy all indices must be different static const bool IsWritable = (A != B); /// \brief Use this type if an function should be created only if the /// current swizzle is write able. typedef typename std::conditional<IsWritable, SwizzleProxy2, struct OperationNotAvailable>::type WriteableThisType; /// \brief Use a locally defined macro to reduce the vector /// implementation overhead. # define CREATE_ASSIGMENT_OPERATOR(Op) \ template<class VectorType2, typename Data2, int A2, int B2> \ WriteableThisType& operator Op (const SwizzleProxy2<VectorType2, Data2, A2, B2>& _rhs) \ { \ ((Data*)this)[A] Op ((const Data2*)&_rhs)[A2]; \ ((Data*)this)[B] Op ((const Data2*)&_rhs)[B2]; \ return *this; \ } \ \ /* Scalar operation */ \ WriteableThisType& operator Op (const Data _rhs) \ { \ ((Data*)this)[A] Op _rhs; \ ((Data*)this)[B] Op _rhs; \ return *this; \ } CREATE_ASSIGMENT_OPERATOR( = ) CREATE_ASSIGMENT_OPERATOR( += ) CREATE_ASSIGMENT_OPERATOR( -= ) CREATE_ASSIGMENT_OPERATOR( *= ) CREATE_ASSIGMENT_OPERATOR( /= ) // The following operators are only defined for integer types. CREATE_ASSIGMENT_OPERATOR( |= ) CREATE_ASSIGMENT_OPERATOR( &= ) CREATE_ASSIGMENT_OPERATOR( ^= ) CREATE_ASSIGMENT_OPERATOR( %= ) CREATE_ASSIGMENT_OPERATOR( <<= ) CREATE_ASSIGMENT_OPERATOR( >>= ) # undef CREATE_ASSIGMENT_OPERATOR # define CREATE_ARITHMETIC_OPERATOR(Op) \ template<class VectorType2, typename Data2, int A2, int B2> \ VectorType operator Op (const SwizzleProxy2<VectorType2, Data2, A2, B2>& _rhs) const \ { \ VectorType result; \ result[0] = ((const Data*)this)[A] Op ((const Data2*)&_rhs)[A2]; \ result[1] = ((const Data*)this)[B] Op ((const Data2*)&_rhs)[B2]; \ return result; \ } \ \ VectorType operator Op (const Data _rhs) const \ { \ VectorType result; \ result[0] = ((const Data*)this)[A] Op _rhs; \ result[1] = ((const Data*)this)[B] Op _rhs; \ return result; \ } \ \ friend VectorType operator Op (const Data _lhs, const SwizzleProxy2& _rhs) \ { \ VectorType result; \ result[0] = _lhs Op ((const Data*)&_rhs)[A]; \ result[1] = _lhs Op ((const Data*)&_rhs)[B]; \ return result; \ } CREATE_ARITHMETIC_OPERATOR(+) CREATE_ARITHMETIC_OPERATOR(-) CREATE_ARITHMETIC_OPERATOR(*) CREATE_ARITHMETIC_OPERATOR(/) // Integer only operators CREATE_ARITHMETIC_OPERATOR(|) CREATE_ARITHMETIC_OPERATOR(&) CREATE_ARITHMETIC_OPERATOR(^) CREATE_ARITHMETIC_OPERATOR(%) CREATE_ARITHMETIC_OPERATOR(<<) CREATE_ARITHMETIC_OPERATOR(>>) # undef CREATE_ARITHMETIC_OPERATOR }; |
Remark: The scalar-vector operators are implemented as friend . This is a trick in C++ to avoid having such functions in the global namespace. The compiler can still find the function by ADL (argument dependent lookup). For each different template argument setup of the proxy class there is exactly one such operator.
You might have noticed that the template takes a VectorType argument. This is required in the implementation of the non-assigning operators as a simple +. These must return a new copy which is only possible of the real vector type is known.
The Final Vector Class
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
/// \brief Implementation of a 2D vector class with swizzling. template<typename Data> struct Vec2_Base: public detail::SwizzleProxy2<Vec2_Base<Data>,Data,0,1> { typedef Data DataType; /// \brief The data with a multitude of access functions union { DataType m_data[2]; detail::SwizzleProxy1<Vec1_Base<Data>,Data,0> x, r; detail::SwizzleProxy1<Vec1_Base<Data>,Data,1> y, g; detail::SwizzleProxy2<Vec2_Base<Data>,Data,0,0> xx, rr; detail::SwizzleProxy2<Vec2_Base<Data>,Data,0,1> xy, rg; detail::SwizzleProxy2<Vec2_Base<Data>,Data,1,0> yx, gr; detail::SwizzleProxy2<Vec2_Base<Data>,Data,1,1> yy, gg; detail::SwizzleProxy3<Vec3_Base<Data>,Data,0,0,0> xxx, rrr; detail::SwizzleProxy3<Vec3_Base<Data>,Data,0,0,1> xxy, rrg; detail::SwizzleProxy3<Vec3_Base<Data>,Data,0,1,0> xyx, rgr; detail::SwizzleProxy3<Vec3_Base<Data>,Data,0,1,1> xyy, rgg; detail::SwizzleProxy3<Vec3_Base<Data>,Data,1,0,0> yxx, grr; detail::SwizzleProxy3<Vec3_Base<Data>,Data,1,0,1> yxy, grg; detail::SwizzleProxy3<Vec3_Base<Data>,Data,1,1,0> yyx, ggr; detail::SwizzleProxy3<Vec3_Base<Data>,Data,1,1,1> yyy, ggg; detail::SwizzleProxy4<Vec4_Base<Data>,Data,0,0,0,0> xxxx, rrrr; detail::SwizzleProxy4<Vec4_Base<Data>,Data,0,0,0,1> xxxy, rrrg; detail::SwizzleProxy4<Vec4_Base<Data>,Data,0,0,1,0> xxyx, rrgr; detail::SwizzleProxy4<Vec4_Base<Data>,Data,0,0,1,1> xxyy, rrgg; detail::SwizzleProxy4<Vec4_Base<Data>,Data,0,1,0,0> xyxx, rgrr; detail::SwizzleProxy4<Vec4_Base<Data>,Data,0,1,0,1> xyxy, rgrg; detail::SwizzleProxy4<Vec4_Base<Data>,Data,0,1,1,0> xyyx, rggr; detail::SwizzleProxy4<Vec4_Base<Data>,Data,0,1,1,1> xyyy, rggg; detail::SwizzleProxy4<Vec4_Base<Data>,Data,1,0,0,0> yxxx, grrr; detail::SwizzleProxy4<Vec4_Base<Data>,Data,1,0,0,1> yxxy, grrg; detail::SwizzleProxy4<Vec4_Base<Data>,Data,1,0,1,0> yxyx, grgr; detail::SwizzleProxy4<Vec4_Base<Data>,Data,1,0,1,1> yxyy, grgg; detail::SwizzleProxy4<Vec4_Base<Data>,Data,1,1,0,0> yyxx, ggrr; detail::SwizzleProxy4<Vec4_Base<Data>,Data,1,1,0,1> yyxy, ggrg; detail::SwizzleProxy4<Vec4_Base<Data>,Data,1,1,1,0> yyyx, gggr; detail::SwizzleProxy4<Vec4_Base<Data>,Data,1,1,1,1> yyyy, gggg; }; /// \brief Fast default construction without initialization Vec2_Base() {} /// \brief Construction from scalar explicit Vec2_Base(Data _x) { m_data[0] = _x; m_data[1] = _x; } /// \brief Construction from two elements Vec2_Base(Data _x, Data _y) { m_data[0] = _x; m_data[1] = _y; } /// \brief creation from swizzle type template<typename VectorType, typename Data2, int A, int B> Vec2_Base( const detail::SwizzleProxy2<VectorType,Data2,A,B>& _v ) { m_data[0] = _v[A]; m_data[1] = _v[B]; } // Standard copy and assignment operator are defined as well }; |
If the final class would not inherit from the proxy class operations on normal vectors would not succeed. Instead it would be necessary to write additional operators which take vector-swizzle, vecot-vector and swizzle-vector arguments but fortunately inheritance is much easier.
Then the union is filled with all access patterns up to vec4. As you can see these are 30 for a vec2. For a vec4 itself this number grows to 340 because there are four instead of two indices for each element.
Before the last constructor we would not be able to use all the nice swizzling stuff fluently. Calling move(position.zyx) would fail because .zyx is not a vector (assuming move would like to have a vector). The implicit cast generated through this constructor is rounding off the whole implementation.
Full Header: swizzle.7z
Currently the implementation lacks functions like normalization... They might follow later.