My tests were done with the 64 bits icas executable
and script files available
here.
This is a statically linked
binary, the dynamically linked icas inside the 64 bit
giac
debian package
should give similar timings.
Timings on an 8*Xeon(R) CPU X5482 @ 3.20GHz (July 2010).
Source code here
(or spkg)
Groebner basis
Frederic Han reports the following
timings in comparison with Magma. For large problems without symmetries
Giac seems to be faster than Magma.
Multivariate polynomial *
SDMP timings are not present since I don't have a copy of a recent maple version and they would not be fully meaningfull anyway since the immediate
data type used by SDMP is 192 bits integer (no allocation for these
integers).
Therefore
comparison is done with the fastest free (as in beer) software I know,
Trip, which is in competition with SDMP.

Fateman dense test 46376*46376 > 635376 terms 14s (real 3.6s).
Faster than TRIP v1.1.0: 34.18s (real 5.58s) or 26.6s (real 3.8 to 4.34s)
in POLPV mode

MonaganPearce sparse test 6188*6188 > 5821335 terms
 1 thread: 4.4s (half of
the time is used to convert the product with int128 coeffs
into a giac multivariate polynomial with GMP coeffs).
TRIP: 1.52s

8 threads: 4.91s (real 3.06s) = 2.6s (int128 *, real 0.75) + 2.3s (conversion).
Parallelisation does not apply to
int128>GMP conversion (tests show that it would slow down the process
perhaps due to locks when calling malloc) which explains that real
time is high.
Slower than TRIP v1.1.0: 01.78s (real 0.39s)
 Same test with power 16 instead of 12, 1 thread: 36s = 20s to do
the int128 multiplication + 16s for conversion. Trip = 20.4s.
 Same test with power 16 and 8 threads: 45.6s (real 22.2s) = 29.5s for
parallel int128 multiplication (real 6.1)+16.1s for conversion.
Trip = 21.6s (real 4.26s).
Further speed improvements for sparse multiplication in giac would require a
thread lockfree malloc, or/and another format for the main giac
data type (gen) which would containt natively larger integers, but
this would require many changes in the code, and could impact
other operations (if for example sizeof(gen) would double,
we could have direct integers up to around 120 bits, but most
symbolic objects would have a size*2 which would slow operations).
As far as I understand, Trip uses a thread lockfree malloc, but
this has constraints (idle threads wait for tasks from a master thread,
each thread has it's own heap) and it's probably not easy at all
to use refcounted objects with local heaps.
Multivariate polynomial GCD
Timings are about the same as Magma 2.159 for
Fermat GCD test 1 to 4 vars, on Z or Z/nZ (magma is known to be the best in this
category).
Giac timings are followed by magma timings in parenthesis.
 on Z:

1var:
p 0.0008s (<0.001), 51 terms,
q 0.013s (0.009), 496 terms,
r 0.032s (0.024), 996 terms,

2var:
p 0.00085s (0.001), 66 terms,
q 0.0021s (0.003), 230 terms,
r 0.013s (0.02), 1318 terms,
s 0.04s (0.064), 3310 terms

3var
p 0.0034s (0.006), 286 terms,
q 0.026s (0.037), 1760 terms,
r 0.18s (0.137), 5431 terms

4var
p: 0.011s (0.005), 126 terms,
q: 0.028s (0.023), 493 terms,
r: 0.048s (0.052), 997 terms,
s: 0.17s (0.223), 3859 terms
 Mod 43051

1var
p: 0.000085s (<0.001), 51 terms,
q: 0.0018s (0.004), 500 terms,
r: 0.006s (0.011), 1001 terms,

2var
p: 0.00052s (<0.001), 66 terms,
q: 0.0013s (0.005), 231 terms,
r: 0.0055s (0.012), 1326 terms,
s: 0.017s (0.038), 3321 terms,

3var
p: 0.0011s (0.001), 56 terms,
q: 0.0034s (0.004), 286 terms,
r: 0.014s (0.027), 1771 terms,

4var
p: 0.0075s (0.003), 126 terms,
q: 0.016s (0.015), 495 terms,
r: 0.032s (0.033), 1001 terms,
For comparion, see also
Sage wiki.
Combined test (*,/,factor)
Pearce tests,
comparison with
maple
14. I don't know how the hardware compare, it seems that
factorization is faster with giac.
 f1:=(x+y+z+1)^20+1:; p1:=normal(f1*(f1+1)):; 0.09s, 12341 terms,
q1:=quo(p1,f1):; 0.44s, factor(p1): 0.8s
 f2:=(x+y+z+1)^30+1:; p2:=normal(f2*(f2+1)):; 0.28s, 39711 terms,
q2:=quo(p2,f2): 1.92s, factor(p2): 9.8s
 f3:=(x+y+z+t+1)^20+1:; p3:=normal(f3*(f3+1)):; 1.31s, 135751 terms,
q3:=quo(p3,f3): 4.22s, factor(p3): 11.3s