Friday, 30 January 2009

Eurographics Symposium on Rendering

The 20th annual Eurographics Symposium on Rendering will take place in Girona, Spain, from June 29th to July 1st, 2009.  (http://iiia.udg.edu/EGSR2009/index.html)

The call for papers has some interesting topics on the list. I think this could be well worth attending to catch up with the latest developments. I am very tempted to write a paper for it but with the upcoming wedding and huge work load I don't think its feasable.

Coalesced write vs Coalesced read counters in Profiler

In my last post I mentioned that I was still investigating why the CUDA profiler was counting 4x as many Coalesced writes as it did reads. While the profiler's manual does mention that the counters are merely a measure of performance and not absolute it still did not make sense. 

As always the helpful people at NVidia responded to my request almost immediately.

Apparently the GLD gets incremented by 1 when a 32B/64B/128B gld request is sent but the GST gets incremented by 2 for 32B, 4 for 64B and 8 for 128B requests.

This explains perfectly why my counters for GST are 4x higher than the GLD counters.  The time differentials between uncoalesced reads/writes seem to remain which is worth remembering for the reasons mentioned in the last article.

Tuesday, 27 January 2009

Investigation of Uncoalesced vs Coalesced Read and Write Speeds in CUDA

While working on my sorting algorithms I came across an interesting scenario. At the moment my still incomplete sort uses two kernels. At the end of kernel A I have the choice of writing out to global memory in an uncoalesced or coalesced manner.  If kernel A writes out in a coalesced way then kernel B's initial read of global memory will have to be uncoalesced. Likewise if kernel A's write is uncoalesced then kernel B's read will be coalesced.

The CUDA documentation does not mention if uncoalesced writes are the same speed wise as uncoalesced reads so in order to get the peak performance of my sorting I needed to run a few tests with different kernels in the CUDA profiler.

I made four kernels, each doing a very simple task:  read in from global memory and store in shared mem then write from shared mem into a different area of global memory. Each kernel reads or writes to or from global memory in a different fashion to ensure coalesced / uncoalesced access for my 8800GT.  Note that on newer cards the coalescing rules are a bit different (see the documentation for details)

Friday, 23 January 2009

Sorting Algorithms

In most programs there comes a time when you need to sort your data into some sort of order. Most recently I have needed to do this in both the "collapsing universe" simulation and my real time ray tracing acceleration. Ideally if you can get away without sorting data you should, but often you need to in order to accelerate a later stage of the algorithm thereby making a nett saving in time.

The popular sorting algorithms such as bubble sort, insertion sort, quicksort, radix sort, heapsort etc  range in worst case order times from  O(n log n) to O(n^2) .  The order here is based on the number of element comparisons. And for memory requirements they range from O(1) to O(log n)   (quicksort)  although some not quite so popular algorithms use higher memory requirements.

Thursday, 15 January 2009

CFD and our collapsing galactic centre

While doing some more research on CFD and the Lattice Boltzman method I came across the following paper (December 19 - 2008) which covers the basics of the method and an efficient GPU (CUDA) implementation.

Unfortunately I have done little in the way of my own implementation as I have been playing with collapsing our galactic centre. So far I have modified some of my code which usually looks for letters in a scanned document to finding stars in the bitmap. The next stage is to assign a z coordinate to the point based on its size / whiteness value and an x,y based on the original image coordinates and then the collapsing can begin :p   Hardly scientific but particle simulations are always so much fun.

Data Sets

After complaining in my last post about not being able to find freely available real world data sets for tomography I managed to find the following sites:

http://www.volvis.org/

http://www.cg.tuwien.ac.at/xmas/

http://radiology.uiowa.edu/downloads/

I've only downloaded one set so far and haven't had a chance to look at it but from the thumbnails on the sites they look good. The Christmas tree model in particular looks like a lot of fun to work on and as a bonus it is huge.

Wednesday, 7 January 2009

CFD / Seismic Analysis / Tomography

I'm still very busy playing with ray tracing as there are just so many opportunities to optimize the performance. It also involves a lot of converting seemingly simple serial algorithms to parallel versions - sorts / searches etc. Couple that with a full time job and sometimes I just need a break. A break in my book is defined as "a different project" :p   

Not that I'm stopping work on raytracing and rendering - far from it, in fact the alternatives I'm looking at include a lot of similar processes / algorithms. The three topics in the title are what I'm looking at being largely inspired by the great work done by the guys who attended the Tesla Personal Supercomputer launch last month.

Galactic Centre Photograph

I found this on the Hubble site

[caption id="attachment_339" align="aligncenter" width="400" caption="Galactic Centre"]Galactic Centre[/caption]

They also have much bigger versions available up to 6149 X 2902. (according to http://hubblesite.org/copyright/ it seems that I can post it here) In itself it is pretty remarkable but also provides a rather nice large and complex image for image processing. I have been playing with trying to extract the individual stars and using their intensity / connected region size to specify a Z co-ordinate. With brighter / larger being closer. If this seems odd it is because I dont have the stars real positions and its just for fun anyway :)   Then using these x,y,z coordinates and running a gravity simulation on them.  I know this has absolutely no scientific merit as there is no initial velocity / mass / proper coords / space is probably warped there / etc  but it could look cool...

Monday, 5 January 2009

Happy new year!

Happy 2009 to everyone, I hope its a really good one and you get lots of new hardware to play with :)

This year I've got a lot of ideas to play with, and having been given a whole lot of old physics books (mostly particle physics) I may just get around to making some simulations.

Nothing new on the raytracing front yet, although once again I have been playing with some huge models and working out better ways to build (or rather not build) acceleration structures.

More soon...