News feed
Log in to your course web
You are not logged in KTH, so we cannot customize the content.
In the News feed, you find updates for pages, schedule and posts from teachers (when aimed also at earlier registered students).
May 2014
Here is an interesting course I can recommend:
Introduction to High-Performance Computing
PDC Summer School
KTH Royal Institute of Technology, Stockholm, Sweden
August 18-29, 2014
http://www.pdc.kth.se/education/summer-school
Show more similar (2)
May 2012
Given that:
*A GPU contains multiple SIMD processors
*Each SIMD processor contains multiple lanes.
*Each SIMD processor is assigned a single thread block (by the thread block scheduler)
The question is which one of these two alternatives is correct:
-Alt1 (parallel execution of threads): Each lane runs a single thread among all threads in the thread block -> to completely become executed, each thread takes as many clock cycles as there is elements in the vector that it writes to/reads from
-Alt2 ("sequential-alternating" execution of threads): Each thread occupies all lanes in a single SIMD processor -> each thread takes round_up(<nr_of_elements_in_the_vector>/<nr_of_lanes_per_SIMD_processor>) clock cycles to finish execution (not necessary consecutive) -> the thread scheduler (in each SIMD processor) schedules/alternates between different threads even if a single thread didn't finish all its cycles. So threads doesn't execute in parallel
(PS. Alt1 is what I understood from the GPU class/slides; Alt2 is what I understood from the book)
Alt1 is the correct alternative. The book was a bit uncleare about that I think, or maybe I have missed something on it; but the slides are anyhow more cleare with more figures.
Thank you Artur for answering the question and for the slides.