Changed CPU velocity calculation for EMISSION_SHAPE_DIRECTED_POINTS
to follow the same logic as in the GPU version:
mat2 rotm;
rotm[0] = texelFetch(emission_texture_normal, emission_tex_ofs, 0).xy;
rotm[1] = rotm[0].yx * vec2(1.0, -1.0);
VELOCITY.xy = rotm * VELOCITY.xy;
Now both CPUParticles2D & CPUParticles3D (z disabled) show the same results
as their GPU counterparts and take the initial velocity settings into account.