CUDA Warp Divergence -
i' m developing cuda , have arithmetic problem, implement or without warp diverengence. warp divergence like:
float v1; float v2; //calculate values of v1 , v2 if(v2 != 0) v1 += v2*complicated_math(); //store v1
without warp divergence version looks like:
float v1; float v2; //calculate values of v1 , v2 v1 += v2*complicated_math(); //store v1
the question is, version faster?
in other words how expensive warp disable compared calculation , addition of 0?
your question has no single answer. heavily depends on amount of calculations, divergence frequency, type of hardware, dimensions , many more aspects. best way program both , use profiling determine best solution in particular case , situation.
Comments
Post a Comment