XOR only represents binary addition where no two equal bits are both 1 as that would cause the need for a carry which XOR doesn't provide so it depends on what you are doing as to whether or not you can substitute an ADD with an XOR e.g.
01010100 +
10101000 =
11111100
which is the same as an XOR but if any two equal bits are set then
010101010 +
001010110 =
100000000
which isn't the same as the XOR of the two (11111100). As for multiplication, a shift in binary is the same as multiplying/dividing by 2 just as doing so in decimal is the same as multiplying/dividing by 10, the can be handy for multiplying/dividing by powers of two e.g.
a / 8 == a >> 3.
Multiplication on 80x86 integer values isn't very nice as it uses two registers for output instead of one and takes 10-11 cycles where as a
shift only takes 1-4 and uses a single register, combined with the fact an
add can take 1-3 cycles it can be quicker to do something like
a << 4 + a << 2 instead of
a * 20. Here's a couple of handy things to remember to avoid the troublesome multiplication, division and modulus for when n is a power of two:
Code: Select all
a * n = a << log2(n)
a / n = a >> log2(n)
a % n = a & (n - 1)
As for your speed issue it helps to understand the low level operations that your computer does (i.e. learn assembly language), your computer (an 80x86 processor I assume) normally works with units of memory in sizes of 1,2,4 and maybe 8 bytes (if you use a 64bit OS) and so the process of adding a series of these units is as simple as:
Code: Select all
char myArray[] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
long Sum = 0;
for(int i = 0; i < 16 /* sizeof(myArray) / sizeof(myArray[0])*/; i++){
Sum += myArray[i]
}
whereas when you use a single bits it has to extract the values:
Code: Select all
char myBits[] = {0xaa,0x55};
long Sum = 0;
for(int i = 0; i < 16 /* sizeof(myBits)*8 */; i++){
Sum += (myBits[i/8] >> (7 - (i % 8))) & 1; // we have to extract the bit first
}
as you can see that has a lot more operations and takes up a lot more time to get the sum of 16 units of data. The choice between the two is a time/memory tradeoff issue as using bits means you only use ⅛ of the memory but takes significantly longer to execute. Where possible unless you have to work to memory constraints it's best to operate on the largest unit of memory your processor can handle such as instead of this:
Code: Select all
char sthToZero[] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
for(int i = 0; i < 16 /*sizeof(sthToZero)*/; i++){
sthToZero[i] ^= sthToZero[i];
}
you would be better off having something like this:
Code: Select all
long *ptrLong = (long*)sthToZero;
for(int i = 0; i < 4 /*sizeof(sthToZero) / sizeof(long)*/; i++){
ptrLong[i] ^= ptrLong[i];
}
The second one takes ¼ the number of operations and it is actually faster to access a 32bit value in one go than it is to access an 8bit one on most PCs. Instruction sets like MMX, 3DNow! and SSE use SMID operations (
Single Instruction Multiple Data) which allow you to speed up operations on vectors/arrays of data and are available through libraries (possibly), compiler optimizations (GCC's flags are -mmmx -m3dnow -msse -msse2 ect.) or (inline) assembly. You might want to check what your Processor supports so you can try to benefit from them, most 80x86 CPUs nowadays support at least SSE2 (both Intel & AMDs).
The process of optimizing code can be a grotty business if you get pedantic about it so you're probably best with sticking catering for your compiler and it's optimizations instead of taking up assembly to do it, it's quite likely that you will actually make slower code if you use assembly as the compiler knows about architecture specific speed issues.
ramble ramble ramble ramble ramble ramble ramble