SPO600 Lab 5 – SIMD and Auto-Vectorization

SIMD instructions and vectorization

Vectorization refers to a compiler unrolling a loop combined with generating SIMD instructions. Each SIMD (Single Instruction Multiple Data) instruction operates on more than one data element at a time, so a loop can run more efficiently. With auto-vectorization, the compiler can identify and optimize some loops on its own, which means it can automatically vectorize a loop. Aarch64 has 32 128-bit wide vector registers that SIMD instructions use and they are named V0 to V31. You can refer to the ARM manual for more information about SIMD instructions and vector registers.

Writing vectorizable code and enabling auto-vectorization

For this lab, I need to write a program that fills two 1000-element integer arrays with random numbers between -1000 and 1000, sums these two arrays element-by-element to a third array, and calculates the sum of all elements in the third array and prints the result. Here is my program that accomplishes these tasks without considering vectorization:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define RANDNUM 1000

int main(void)
{
 // Declare variables
 int array1[RANDNUM], array2[RANDNUM], array3[RANDNUM];
 int i, minNum = -1000, maxNum = 1000, sum = 0;

// Randomize seed
 srand(time(NULL));

for (i = 0; i < RANDNUM; i++) {
 // Store random numbers in two arrays
 array1[i] = minNum + rand() % (maxNum + 1 - minNum);
 array2[i] = minNum + rand() % (maxNum + 1 - minNum);

// Sum array elements into third array
 array3[i] = array1[i] + array2[i];

// Sum of third array elements
 sum += array3[i];
 }

// Display sum of third array elements
 printf("Sum of all elements in the third array is: %d\n", sum);
 return 0;
}

I use the command “gcc -O0 lab5.c -o lab5” to compile my program with no optimization using the -O0 option. Here is the disassembly output for the section <main> using the “objdump -d” command:

0000000000400684 <main>:
 400684: d285e010 mov x16, #0x2f00 // #12032
 400688: cb3063ff sub sp, sp, x16
 40068c: a9007bfd stp x29, x30, [sp]
 400690: 910003fd mov x29, sp
 400694: 12807ce0 mov w0, #0xfffffc18 // #-1000
 400698: b92ef7a0 str w0, [x29,#12020]
 40069c: 52807d00 mov w0, #0x3e8 // #1000
 4006a0: b92ef3a0 str w0, [x29,#12016]
 4006a4: b92efbbf str wzr, [x29,#12024]
 4006a8: d2800000 mov x0, #0x0 // #0
 4006ac: 97ffff99 bl 400510 <time@plt>
 4006b0: 97ffffac bl 400560 <srand@plt>
 4006b4: b92effbf str wzr, [x29,#12028]
 4006b8: 14000038 b 400798 <main+0x114>
 4006bc: 97ffff9d bl 400530 <rand@plt>
 4006c0: 2a0003e1 mov w1, w0
 4006c4: b96ef3a0 ldr w0, [x29,#12016]
 4006c8: 11000402 add w2, w0, #0x1
 4006cc: b96ef7a0 ldr w0, [x29,#12020]
 4006d0: 4b000040 sub w0, w2, w0
 4006d4: 1ac00c22 sdiv w2, w1, w0
 4006d8: 1b007c40 mul w0, w2, w0
 4006dc: 4b000021 sub w1, w1, w0
 4006e0: b96ef7a0 ldr w0, [x29,#12020]
 4006e4: 0b000022 add w2, w1, w0
 4006e8: b9aeffa0 ldrsw x0, [x29,#12028]
 4006ec: d37ef400 lsl x0, x0, #2
 4006f0: 914007a1 add x1, x29, #0x1, lsl #12
 4006f4: 913d4021 add x1, x1, #0xf50
 4006f8: b8206822 str w2, [x1,x0]
 4006fc: 97ffff8d bl 400530 <rand@plt>
 400700: 2a0003e1 mov w1, w0
 400704: b96ef3a0 ldr w0, [x29,#12016]
 400708: 11000402 add w2, w0, #0x1
 40070c: b96ef7a0 ldr w0, [x29,#12020]
 400710: 4b000040 sub w0, w2, w0
 400714: 1ac00c22 sdiv w2, w1, w0
 400718: 1b007c40 mul w0, w2, w0
 40071c: 4b000021 sub w1, w1, w0
 400720: b96ef7a0 ldr w0, [x29,#12020]
 400724: 0b000022 add w2, w1, w0
 400728: b9aeffa0 ldrsw x0, [x29,#12028]
 40072c: d37ef400 lsl x0, x0, #2
 400730: 913ec3a1 add x1, x29, #0xfb0
 400734: b8206822 str w2, [x1,x0]
 400738: b9aeffa0 ldrsw x0, [x29,#12028]
 40073c: d37ef400 lsl x0, x0, #2
 400740: 914007a1 add x1, x29, #0x1, lsl #12
 400744: 913d4021 add x1, x1, #0xf50
 400748: b8606821 ldr w1, [x1,x0]
 40074c: b9aeffa0 ldrsw x0, [x29,#12028]
 400750: d37ef400 lsl x0, x0, #2
 400754: 913ec3a2 add x2, x29, #0xfb0
 400758: b8606840 ldr w0, [x2,x0]
 40075c: 0b000022 add w2, w1, w0
 400760: b9aeffa0 ldrsw x0, [x29,#12028]
 400764: d37ef400 lsl x0, x0, #2
 400768: 910043a1 add x1, x29, #0x10
 40076c: b8206822 str w2, [x1,x0]
 400770: b9aeffa0 ldrsw x0, [x29,#12028]
 400774: d37ef400 lsl x0, x0, #2
 400778: 910043a1 add x1, x29, #0x10
 40077c: b8606820 ldr w0, [x1,x0]
 400780: b96efba1 ldr w1, [x29,#12024]
 400784: 0b000020 add w0, w1, w0
 400788: b92efba0 str w0, [x29,#12024]
 40078c: b96effa0 ldr w0, [x29,#12028]
 400790: 11000400 add w0, w0, #0x1
 400794: b92effa0 str w0, [x29,#12028]
 400798: b96effa0 ldr w0, [x29,#12028]
 40079c: 710f9c1f cmp w0, #0x3e7
 4007a0: 54fff8ed b.le 4006bc <main+0x38>
 4007a4: 90000000 adrp x0, 400000 <_init-0x4d8>
 4007a8: 91220000 add x0, x0, #0x880
 4007ac: b96efba1 ldr w1, [x29,#12024]
 4007b0: 97ffff70 bl 400570 <printf@plt>
 4007b4: 52800000 mov w0, #0x0 // #0
 4007b8: a9407bfd ldp x29, x30, [sp]
 4007bc: d285e010 mov x16, #0x2f00 // #12032
 4007c0: 8b3063ff add sp, sp, x16
 4007c4: d65f03c0 ret

The disassembly output above contains 81 lines of instructions.

Now, I use the command “gcc -O3 lab5.c -o lab5a” to compile my program with a lot of optimization using the -O3 option. The -O3 option enables a lot of optimization and enables auto-vectorization. Here is the disassembly output for the section <main>:

0000000000400580 <main>:
 400580: a9bc7bfd stp x29, x30, [sp,#-64]!
 400584: d2800000 mov x0, #0x0 // #0
 400588: 910003fd mov x29, sp
 40058c: a9025bf5 stp x21, x22, [sp,#32]
 400590: 529a9c75 mov w21, #0xd4e3 // #54499
 400594: a90153f3 stp x19, x20, [sp,#16]
 400598: 72a83015 movk w21, #0x4180, lsl #16
 40059c: f9001bf7 str x23, [sp,#48]
 4005a0: 52807d13 mov w19, #0x3e8 // #1000
 4005a4: 5280fa34 mov w20, #0x7d1 // #2001
 4005a8: 52800017 mov w23, #0x0 // #0
 4005ac: 97ffffd9 bl 400510 <time@plt>
 4005b0: 97ffffec bl 400560 <srand@plt>
 4005b4: 97ffffdf bl 400530 <rand@plt>
 4005b8: 2a0003f6 mov w22, w0
 4005bc: 97ffffdd bl 400530 <rand@plt>
 4005c0: 9b357c03 smull x3, w0, w21
 4005c4: 71000673 subs w19, w19, #0x1
 4005c8: 9b357ec2 smull x2, w22, w21
 4005cc: 9369fc63 asr x3, x3, #41
 4005d0: 4b807c63 sub w3, w3, w0, asr #31
 4005d4: 9369fc42 asr x2, x2, #41
 4005d8: 4b967c42 sub w2, w2, w22, asr #31
 4005dc: 1b148060 msub w0, w3, w20, w0
 4005e0: 1b14d842 msub w2, w2, w20, w22
 4005e4: 0b000040 add w0, w2, w0
 4005e8: 511f4000 sub w0, w0, #0x7d0
 4005ec: 0b0002f7 add w23, w23, w0
 4005f0: 54fffe21 b.ne 4005b4 <main+0x34>
 4005f4: 2a1703e1 mov w1, w23
 4005f8: 90000000 adrp x0, 400000 <_init-0x4d8>
 4005fc: 911f8000 add x0, x0, #0x7e0
 400600: 97ffffdc bl 400570 <printf@plt>
 400604: 52800000 mov w0, #0x0 // #0
 400608: f9401bf7 ldr x23, [sp,#48]
 40060c: a94153f3 ldp x19, x20, [sp,#16]
 400610: a9425bf5 ldp x21, x22, [sp,#32]
 400614: a8c47bfd ldp x29, x30, [sp],#64
 400618: d65f03c0 ret
 40061c: 00000000 .inst 0x00000000 ; undefined

The disassembly output above contains 40 lines of instructions, which is about half the amount of instructions compared to the first case. This is an indication that optimization has occurred. Auto-vectorization is enabled but the disassembly output does not contain SIMD instructions, which means that the code is not vectorized.

I need to change my code in order for it to become vectorizable. Instead of using one for loop,  I will divide it into three for loops. The first loop stores random numbers into the two arrays. The second loop sums these two arrays element-by-element to a third array. The third loop calculates the sum of all of the elements in the third array. Here is my program with vectorizable code:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define RANDNUM 1000

int main(void)
{
 // Declare variables
 int array1[RANDNUM], array2[RANDNUM], array3[RANDNUM];
 int i, minNum = -1000, maxNum = 1000, sum = 0;

// Randomize seed
 srand(time(NULL));

// Store random numbers in two arrays
 for (i = 0; i < RANDNUM; i++) {
 array1[i] = minNum + rand() % (maxNum + 1 - minNum);
 array2[i] = minNum + rand() % (maxNum + 1 - minNum);
 }

// Sum array elements into third array
 for (i = 0; i < RANDNUM; i++) {
 array3[i] = array1[i] + array2[i];
 }

// Sum of third array elements
 for (i = 0; i < RANDNUM; i++) {
 sum += array3[i];
 }

// Display sum of third array elements
 printf("Sum of all elements in the third array is: %d\n", sum);
 return 0;

I use the command “gcc -O0 lab5b.c -o lab5b” to compile my program with no optimization using the -O0 option. Here is the disassembly output for the section <main>:

0000000000400684 <main>:
 400684: d285e010 mov x16, #0x2f00 // #12032
 400688: cb3063ff sub sp, sp, x16
 40068c: a9007bfd stp x29, x30, [sp]
 400690: 910003fd mov x29, sp
 400694: 12807ce0 mov w0, #0xfffffc18 // #-1000
 400698: b92ef7a0 str w0, [x29,#12020]
 40069c: 52807d00 mov w0, #0x3e8 // #1000
 4006a0: b92ef3a0 str w0, [x29,#12016]
 4006a4: b92efbbf str wzr, [x29,#12024]
 4006a8: d2800000 mov x0, #0x0 // #0
 4006ac: 97ffff99 bl 400510 <time@plt>
 4006b0: 97ffffac bl 400560 <srand@plt>
 4006b4: b92effbf str wzr, [x29,#12028]
 4006b8: 14000023 b 400744 <main+0xc0>
 4006bc: 97ffff9d bl 400530 <rand@plt>
 4006c0: 2a0003e1 mov w1, w0
 4006c4: b96ef3a0 ldr w0, [x29,#12016]
 4006c8: 11000402 add w2, w0, #0x1
 4006cc: b96ef7a0 ldr w0, [x29,#12020]
 4006d0: 4b000040 sub w0, w2, w0
 4006d4: 1ac00c22 sdiv w2, w1, w0
 4006d8: 1b007c40 mul w0, w2, w0
 4006dc: 4b000021 sub w1, w1, w0
 4006e0: b96ef7a0 ldr w0, [x29,#12020]
 4006e4: 0b000022 add w2, w1, w0
 4006e8: b9aeffa0 ldrsw x0, [x29,#12028]
 4006ec: d37ef400 lsl x0, x0, #2
 4006f0: 914007a1 add x1, x29, #0x1, lsl #12
 4006f4: 913d4021 add x1, x1, #0xf50
 4006f8: b8206822 str w2, [x1,x0]
 4006fc: 97ffff8d bl 400530 <rand@plt>
 400700: 2a0003e1 mov w1, w0
 400704: b96ef3a0 ldr w0, [x29,#12016]
 400708: 11000402 add w2, w0, #0x1
 40070c: b96ef7a0 ldr w0, [x29,#12020]
 400710: 4b000040 sub w0, w2, w0
 400714: 1ac00c22 sdiv w2, w1, w0
 400718: 1b007c40 mul w0, w2, w0
 40071c: 4b000021 sub w1, w1, w0
 400720: b96ef7a0 ldr w0, [x29,#12020]
 400724: 0b000022 add w2, w1, w0
 400728: b9aeffa0 ldrsw x0, [x29,#12028]
 40072c: d37ef400 lsl x0, x0, #2
 400730: 913ec3a1 add x1, x29, #0xfb0
 400734: b8206822 str w2, [x1,x0]
 400738: b96effa0 ldr w0, [x29,#12028]
 40073c: 11000400 add w0, w0, #0x1
 400740: b92effa0 str w0, [x29,#12028]
 400744: b96effa0 ldr w0, [x29,#12028]
 400748: 710f9c1f cmp w0, #0x3e7
 40074c: 54fffb8d b.le 4006bc <main+0x38>
 400750: b92effbf str wzr, [x29,#12028]
 400754: 14000012 b 40079c <main+0x118>
 400758: b9aeffa0 ldrsw x0, [x29,#12028]
 40075c: d37ef400 lsl x0, x0, #2
 400760: 914007a1 add x1, x29, #0x1, lsl #12
 400764: 913d4021 add x1, x1, #0xf50
 400768: b8606821 ldr w1, [x1,x0]
 40076c: b9aeffa0 ldrsw x0, [x29,#12028]
 400770: d37ef400 lsl x0, x0, #2
 400774: 913ec3a2 add x2, x29, #0xfb0
 400778: b8606840 ldr w0, [x2,x0]
 40077c: 0b000022 add w2, w1, w0
 400780: b9aeffa0 ldrsw x0, [x29,#12028]
 400784: d37ef400 lsl x0, x0, #2
 400788: 910043a1 add x1, x29, #0x10
 40078c: b8206822 str w2, [x1,x0]
 400790: b96effa0 ldr w0, [x29,#12028]
 400794: 11000400 add w0, w0, #0x1
 400798: b92effa0 str w0, [x29,#12028]
 40079c: b96effa0 ldr w0, [x29,#12028]
 4007a0: 710f9c1f cmp w0, #0x3e7
 4007a4: 54fffdad b.le 400758 <main+0xd4>
 4007a8: b92effbf str wzr, [x29,#12028]
 4007ac: 1400000b b 4007d8 <main+0x154>
 4007b0: b9aeffa0 ldrsw x0, [x29,#12028]
 4007b4: d37ef400 lsl x0, x0, #2
 4007b8: 910043a1 add x1, x29, #0x10
 4007bc: b8606820 ldr w0, [x1,x0]
 4007c0: b96efba1 ldr w1, [x29,#12024]
 4007c4: 0b000020 add w0, w1, w0
 4007c8: b92efba0 str w0, [x29,#12024]
 4007cc: b96effa0 ldr w0, [x29,#12028]
 4007d0: 11000400 add w0, w0, #0x1
 4007d4: b92effa0 str w0, [x29,#12028]
 4007d8: b96effa0 ldr w0, [x29,#12028]
 4007dc: 710f9c1f cmp w0, #0x3e7
 4007e0: 54fffe8d b.le 4007b0 <main+0x12c>
 4007e4: 90000000 adrp x0, 400000 <_init-0x4d8>
 4007e8: 91230000 add x0, x0, #0x8c0
 4007ec: b96efba1 ldr w1, [x29,#12024]
 4007f0: 97ffff60 bl 400570 <printf@plt>
 4007f4: 52800000 mov w0, #0x0 // #0
 4007f8: a9407bfd ldp x29, x30, [sp]
 4007fc: d285e010 mov x16, #0x2f00 // #12032
 400800: 8b3063ff add sp, sp, x16
 400804: d65f03c0 ret

The disassembly output above contains 97 lines of instructions. We get more instructions than the first case with one loop, which is as expected since we now have three loops. Also as expected, the disassembly output does not contain SIMD instructions since auto-vectorization is not enabled.

Now, I use the command “gcc -O3 lab5b.c -o lab5c” to compile my program with a lot of optimization using the -O3 option. Here is the disassembly output with my bolded comments for the section <main>:

0000000000400580 <main>:
// main() function
 400580: d285e410 mov x16, #0x2f20 // #12064
 400584: cb3063ff sub sp, sp, x16 // stack pointer - x16
 400588: d2800000 mov x0, #0x0 // #0
 40058c: a9007bfd stp x29, x30, [sp] // store x29 and x30 to stack pointer address
 400590: 910003fd mov x29, sp // move stack pointer to x29
 400594: a90153f3 stp x19, x20, [sp,#16] // store x19 and x20 to stack pointer address with offset
 400598: 529a9c74 mov w20, #0xd4e3 // #54499
 40059c: a9025bf5 stp x21, x22, [sp,#32] // store x21 and x22 to stack pointer address with offset
 4005a0: 72a83014 movk w20, #0x4180, lsl #16 // move value to w20
 4005a4: f9001bf7 str x23, [sp,#48] // store x23 to stack pointer address with offset
 4005a8: 910103b6 add x22, x29, #0x40 // x29 + 64 and store in x22
 4005ac: 913f83b5 add x21, x29, #0xfe0 // x29 + 4064 and store in x21
 4005b0: 5280fa33 mov w19, #0x7d1 // #2001
 4005b4: d2800017 mov x23, #0x0 // #0
 4005b8: 97ffffd6 bl 400510 <time@plt> // call time subroutine
 4005bc: 97ffffe9 bl 400560 <srand@plt> // call srand subroutine
// first loop
// array1[i] = minNum + rand() % (maxNum + 1 - minNum)
 4005c0: 97ffffdc bl 400530 <rand@plt> // call rand subroutine
 4005c4: 9b347c01 smull x1, w0, w20 // w0 * w20 and store in x1
 4005c8: 9369fc21 asr x1, x1, #41 // shift x1 value right by 41 bits
 4005cc: 4b807c21 sub w1, w1, w0, asr #31 // subtract shifted register
 4005d0: 1b138020 msub w0, w1, w19, w0 // multiply and subtract
 4005d4: 510fa000 sub w0, w0, #0x3e8 // subtract
 4005d8: b8376ac0 str w0, [x22,x23] // store w0 to an address
// array2[i] = minNum + rand() % (maxNum + 1 - minNum)
 4005dc: 97ffffd5 bl 400530 <rand@plt> // call rand subroutine
 4005e0: 9b347c01 smull x1, w0, w20 // w0 * w20 and store in x1
 4005e4: 9369fc21 asr x1, x1, #41 // shift x1 value right by 41 bits
 4005e8: 4b807c21 sub w1, w1, w0, asr #31 // subtract shifted register
 4005ec: 1b138020 msub w0, w1, w19, w0 // multiply and subtract
 4005f0: 510fa000 sub w0, w0, #0x3e8 // subtract
 4005f4: b8376aa0 str w0, [x21,x23] // store w0 to an address
// loop if i < RANDNUM
 4005f8: 910012f7 add x23, x23, #0x4 // x23 + 4 and store in x23
 4005fc: f13e82ff cmp x23, #0xfa0 // test if x23 = 4000
 400600: 54fffe01 b.ne 4005c0 <main+0x40> // repeat first loop if x23 not equal 4000
 400604: d283f002 mov x2, #0x1f80 // #8064
 400608: 8b0203a1 add x1, x29, x2 // x29 + x2 and store in x1
 40060c: d2800000 mov x0, #0x0 // #0
// second loop
// array3[i] = array1[i] + array2[i];
 400610: 3ce06ac0 ldr q0, [x22,x0] // load register
 400614: 3ce06aa1 ldr q1, [x21,x0] // load register
 400618: 4ea18400 add v0.4s, v0.4s, v1.4s // SIMD vector instruction: v0.4s + v1.4s and store in v0.4s
 40061c: 3ca06820 str q0, [x1,x0] // store q0 to an address
// loop if i < RANDNUM
 400620: 91004000 add x0, x0, #0x10 // x0 + 16 and store in x0
 400624: f13e801f cmp x0, #0xfa0 // test if x0 = 4000
 400628: 54ffff41 b.ne 400610 <main+0x90> // repeat second loop if x0 not equal 4000
 40062c: 4f000400 movi v0.4s, #0x0 // SIMD vector instruction: move immediate (vector)
 400630: aa0103e0 mov x0, x1 // move x1 to x29
 400634: d285e401 mov x1, #0x2f20 // #12064
 400638: 8b0103a1 add x1, x29, x1 // x29 + x1 and store in x1
// third loop
// sum += array3[i];
 40063c: 3cc10401 ldr q1, [x0],#16 // load register
 400640: 4ea18400 add v0.4s, v0.4s, v1.4s // SIMD vector instruction: v0.4s + v1.4s and store in v0.4s
 400644: eb01001f cmp x0, x1 // test if x0 = x1
 400648: 54ffffa1 b.ne 40063c <main+0xbc> // repeat third loop if x0 not equal x1
 40064c: 4eb1b800 addv s0, v0.4s // SIMD vector instruction: add across vector
 400650: 90000000 adrp x0, 400000 <_init-0x4d8> // store address in x0
 400654: 91210000 add x0, x0, #0x840 // x0 + 2112 and store in x0
 400658: 0e043c01 mov w1, v0.s[0] // SIMD vector instruction: move v0.s[0] to w1
 40065c: 97ffffc5 bl 400570 <printf@plt> // call printf subroutine
 400660: f9401bf7 ldr x23, [sp,#48] // load register
 400664: a94153f3 ldp x19, x20, [sp,#16] // load pair of registers
 400668: 52800000 mov w0, #0x0 // #0
 40066c: a9425bf5 ldp x21, x22, [sp,#32] // load pair of registers
 400670: d285e410 mov x16, #0x2f20 // #12064
 400674: a9407bfd ldp x29, x30, [sp] // load pair of registers
 400678: 8b3063ff add sp, sp, x16 // stack pointer + x16 and store in stack pointer
 40067c: d65f03c0 ret // return from subroutine

The disassembly output above contains 64 lines of instructions, which is less than the case with no optimization. In this case, the disassembly output contains SIMD instructions, which means that the code is vectorized. Specifically, the disassembly output shows that the second and third loop is vectorized. The second and third loop contains a few SIMD vector instructions where vector registers are used. For example, the SIMD instruction “add v0.4s, v0.4s, v1.4s” allows 4 additions to be performed in a single instruction. In terms of register “v0.4s”, “v0” represents vector register 0, “4” represents 4 data elements or lanes, and “s” represents the data element size of 32 bits. One instruction uses “v0.s[0]”, which represents a vector register element where “[0]” indicates the element index. Some SIMD instructions use the same name as other types of instructions. For example, we have “add” and “mov” instructions that become SIMD instructions when vector registers are used.

There are a few things to consider when you want to write vectorizable loops. Simple loops are more likely to be vectorizable than complex loops. A loop will not be vectorizable if it contains complex calculations such as the first loop in my program. This is also true if data dependencies exist within the loop, which is when the value of one variable depends on the value of another variable and values are overwritten. These three conditions explain why my first program that has only one big loop cannot be vectorized. Writing vectorizable code is not easy because different compilers handle vectorization differently and we are unfamiliar with that process. It will probably take at least a couple of attempts in modifying our code to get it to work. There are some general guidelines that we can follow but these guidelines may not be always helpful. On the other hand, it is not difficult to identify vectorized code that is shown in the disassembly output.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s