Author Topic: Really Bare Metal Programming on CubieTruck  (Read 25041 times)

Offline Goatfreed

  • Newbie
  • *
  • Posts: 18
  • Karma: +0/-0
    • View Profile
Really Bare Metal Programming on CubieTruck
« on: July 10, 2014, 02:17:43 am »
Hi and sorry for the long post in advance, but I want to describe my situation as detailed as possible, for potential answerers to have a feeling of my situation.

I want to programm the CubieTruck really bare metal, so without any preexisting software (no OS, no bootloader).
I know that the Allwinner A20 has a BROM with boot-code in it that i cannot dismiss, so i will simply regard it as "metal" ;)
At first I want to make the following:
Make an SD-Card, which inherits the code of a Blinky-Programm. Which means I want to programm a binary, that i can write onto an SD-Card which will be executed at startup of the CubieTruck.
So far I did not succeed.

What I did so far:
Installed emIDE and Win32DiscImager on Windows and set up a project for Cortex-A7 (which is in AW A20)

I read a bit through the Allwinner A20 specs (http://dl.cubieboard.org/software/ubuntuone/public/cubieboard/docs/A20_user%20manual%20V1.0%2020130322.pdf, http://dl.cubieboard.org/hardware/A20_Cubietruck_HW_V10_130606.pdf).

In the specs, i found that there are Status LED1 (blue) is connected to PH21 and Status LED2 (yellow) is connected to PH20. Furthermore the PIO base address in the A20 memory map is 0x01C20800. The configuration register for PH20 and PH21 has an offset of 0x104 and is PH_CFG2 (so absolute address is 0x01C20904).
To enable PH20 and PH21 for output, I have to set Bit 16 and 20 of the OH_CFG2 register.
So value 0x110000 (bit 16 and 20 are 1 rest is 0) must be put to address 0x01C20904 (PH_CFG2) to enable PIO output.
The PH_DAT register represents the PIN-states, so there i have to set bit 20 and 21 to 1 in order to have PH_20 and PH_21 "active" and therefore the LEDs on. Bit 20 and 21 are set in the hex number 0x300000 and the PH_DAT register has an offset of 0x10C, which means value 0x300000 has to be put into address 0x01C2090C in order to switch on the two LEDs.

From a disassembly of the BROM in GIT-Hub (https://github.com/hno/Allwinner-Info/blob/master/BROM/ffff4000.s) and a boot0-header description of from linux-sunxi.org I think I know, that I have to use a special string for the BROM to recognize my SD-Card as bootable and starts the code.
This string is "eGON.BT0" and has to be placed offset 0x4, while the first WORD (offset 0x0) must contain a Branch instruction to whereever my code shall continue. This is, because the PC register (which indicates where the next instruction to be processed is) will be set to 0x0 after the SD-Card was recognized (at least this is what i think i found out).

So i programmed that as ARM-Assembler Code and assembled it to a flat binary (.bin) and raw-wrote it onto a micro-SD-Card (I tried 2 different, with different results). As last thing in the program i made an endless loop, so the CPU wont do things I can't controll.

With one SD-Card (SanDisk) nothing at all happend (no LED besides the Power-LED was on) with the other SD-Card (Hama) the yellow LED was shining, but it also was shining, when i crippled my code to be just starting and endless looping the yellow LED was shining as well, so this seems not to be the result of my coding.

I am wondering if anybody here can give me a hint/explaination/support on this matter, as I don't know how to proceed now (I couldn't even say, if my code is not working, or it is not even executed).
Question I ask myself at the moment are:
- Do i need to manipulate PH_PULL or PH_MULTI-DRIVING registers as well to have my LED on?
- Is there any Gating or Clock enabling that I need to do for the PIO to even work?
- Do I need more than just the "eGON.BT0" string at 0x4 to have it recognized by BROM as bootable (at linux-sunxi.org there is a very short Header-description which i cannot make too much sense of. I also tried creating that header but at many places i just don't know which figures to put, also the BROM code does not really seem to care too much about it or i simply missed that part while reading it)?

Offline lawrence

  • Administrator
  • Sr. Member
  • *****
  • Posts: 304
  • Karma: +15/-0
    • View Profile
Re: Really Bare Metal Programming on CubieTruck
« Reply #1 on: July 10, 2014, 01:58:48 pm »
You'll need to initialise the chip / board defaults eg sdram etc before you can do anything.
The A10/A20 etc need to be init'd first to suitable defaults.

So, i'd suggest rather look at the uBoot code thats available, and then write a flat binary that you call from the uBoot.

No real reason to reinvent the wheel unless you're a masochist.

uBoot details here - https://github.com/linux-sunxi/u-boot-sunxi/wiki

Code here - https://github.com/patrickhwood/u-boot/tree/lichee-dev-a20

Look, read, understand, then work from there.

Offline Goatfreed

  • Newbie
  • *
  • Posts: 18
  • Karma: +0/-0
    • View Profile
Re: Really Bare Metal Programming on CubieTruck
« Reply #2 on: July 11, 2014, 01:13:43 am »
Thank you.
I will have a look and hopefully understand it.
If so, it will be enough for me to understand what I want to.

Although you are right, that reinventing the wheel is neither necessary nor productive I really want to do that in this case.
So I will understand what really is to do step by step on the hardware (may very well be, that later on I will simply use uBoot, as it well established already).

Offline Goatfreed

  • Newbie
  • *
  • Posts: 18
  • Karma: +0/-0
    • View Profile
Re: Really Bare Metal Programming on CubieTruck
« Reply #3 on: October 27, 2014, 03:54:07 pm »
So I finally managed what I wanted.
A bare metal Blinky-Program on the CubieTruck.
In case anyone else wants to start bare metal Programming on it, here is what you need:
- An IDE that uses a Compiler for ARM (I used emIDE with GNU ARM GCC)
- A program to raw-write onto an SD (I used Win32 Disk Imager, since i use windows)
- A program to calculate and set the checksum (I wrote my own)

Within the IDE, create a startup.s (or any other assembly file) with a Boot0-header.
How it needs to be structured is described at many places in the internet, but not how exactly to fill it.
You need at first a jump instruction to your start code (e.g. " b start ")
Then you need the eGON boot0 text (" .byte 'e','G','O','N','.','B','T','0' ")
Afterwards reserve a word for the checksum (" .word 0x00000000 ")
Then give the length for the boot0 section (e.g. " .word 0x00006000 "; for some reason the BROM code at boot up time wants a length, that is at least 0x6000 and must have zeros in the first[least significant] 8 bits, otherwise it wont boot)
Reserve a word for the header size (I set it to 0x0, seems that it does not need a value here)
Reserve a word for the header version (I set it to 0x0, seems that it does not need a value here)
Reserve a word for the boot version (I set it to 0x0, seems that it does not need a value here)
Reserve a word for the eGON version (I set it to 0x0, seems that it does not need a value here)
Write 8 bytes with whatever content you want (seems to be intended for platform information, so maybe write your project name or something)
Align it to 4 ( ".align 4" )
set your starting label " start: "
The following is the code in its entirety:
Code: [Select]
.syntax unified

.text
.global entry // make entry visible for linker
entry:

public_header:
     b start // jump over the header to the start point
.byte 'e','G','O','N','.','B','T','0' //eGON boot0 magic value (eGON.BT0)
.word 0x00000000 //checksum
.word 0x00006000 //length for boot0
.word 0x00000000 //header size of boot0
.word 0x00000000 //header version
.word 0x00000000 //boot version
.word 0x00000000 //eGON version
.byte 'A','N','Y',' ','T','E','X','T' //platform information
.align 4

start:
    mov r0, 0x00110000 //Activate_PIN_20_21 content for PH_CFG2
    mov r1, 0x00000000 //Disable_LED content for PH_DAT
    mov r2, 0x00300000 //Enable_LED content for PH_DAT
    ldr r3, =0x01C20904 //PH_CFG2 Address
    str r0, [r3] //This sets the Port H configuration ready for the two LEDs
    ldr r3, =0x01C2090C //PH_DAT Address
    str r2, [r3] //This writes the data to Port H, indicating to switch the Yellow and Blue LED on
    ldr r6, =0x0000FFFF //this is a "constant" for the upper value of the delay counter
    mov r5, 0x0 //this will be the switch to indicate if we have to switch the LEDs on or off
    mov r4, 0x0 //this will be the delay counter
endless:
    add r4, 0x1 //increase the counter by one
    cmp r4, r6 //compare it to the upper bound constant
    beq switch_led //if they are equal go to "switch_led"
    b endless
switch_led:
    mov r4, 0x0 //reset the counter
    cmp r5, 0x1 //check if we want to switch the LEDs on
    beq switch_on_led
switch_off_led:
    mov r5, 0x1 //set the switch so that the next time the LEDs will be switched on
    str r1, [r3] //write the "disable LEDs" data to the data register of Port H
    b endless
switch_on_led:
    mov r5, 0x0 //set the switch so that the next time the LEDs will be switched off
    str r2, [r3] //write the "enable LEDs" data to the data register of Port H
    b endless
This can simply be compiled into a flat binary as we don't need anything linked here (but you are free to encapsulate into other files e.g. C-Code, of course).

For the task of preparing the binary to be bootable (checksum e.g.) I wrote a simple C-program, which i wont post, but here is what it does:
First extend the flat binary to the size of 0x6000 (as described in the Boot0 header!) by filling up the space to that address with "0xFF"-bytes (any other value should also do it, i guess).
at the place, where we reserved a word for the checksum, set "0x5F0A6C39" (this value has no special meaning, i just needs to be there to ensure the correct checksum... the BROM creates the checksum in the same way)
Then go through the whole file, word by word and add up the values.
The resulting word has to be written to the place for the checksum.
My program then adds 0x2000 bytes of space infront of the file, so that my first instruction ( "b start" ) will be at address 0x2000. On linux you don't need that, if you give the DD-command the offset of 8k.
Simply rename the flat binary to have an .img ending and write it to the SD (the renaming is not needed on linux, as the DD-command will not care for the ending).

That's it, insert the SD to the cubietruck and give power to it: The yellow and blue LEDs should blink.
« Last Edit: October 27, 2014, 04:08:13 pm by Goatfreed »

Offline Mike Kaiser

  • Newbie
  • *
  • Posts: 2
  • Karma: +0/-0
    • View Profile
Re: Really Bare Metal Programming on CubieTruck
« Reply #4 on: December 11, 2014, 09:56:10 am »
Hi there, I just wanted to say that this post REALLY helped me out.
Having looked through so many posts about the A20 with little actual information on how to get something working, this post was a breath of fresh air.
Just in case anyone else wants to replicate your success on windows here's what I did.

Compiling the following C++ code to an exe called ChecksumGen.exe using VisualStudio or any other Windows based C++ compiler.

Code: [Select]
// A20 Checksum Generator for Windows.
// by Mike Kaiser (11 Dec 2014) [no additional copyright from me. It's trivial stuff so take it and do what you like with it]
// This code has been inspired by many of the internet forum posts and GPL / LGPL code out there on various forums.
//
// The source of the header definition is directly from sunxi-tools/bootinfo.c
// (C) Copyright 2012 Henrik Nordstrom <henrik@henriknordstrom.net>
//
// Thanks go to Goatfreed on Cubieforums for posting his test program and instructions on how to get it running.
// Thanks to baremetal for suggesting a boot sector to stop windows bringing up an 'unformated' dialog.
// This code is provided without warranty and does not claim to be bug free or fit for any purpose. Use it at your own risk.


#define _CRT_SECURE_NO_WARNINGS

#include <stdio.h>
#include <stdlib.h>

typedef unsigned char  u8;
typedef unsigned short u16;
typedef unsigned int   u32;

typedef signed char  s8;
typedef signed short s16;
typedef signed int   s32;

class Buffer
{
int bufLen;
u8 * buf;

public:
Buffer()
: bufLen(0)
, buf( nullptr )
{
}

~Buffer()
{
if (buf)
free(buf);
}

void Read(FILE * fp)
{
fseek(fp, 0, SEEK_END);
bufLen = ftell(fp);
fseek(fp, 0, SEEK_SET);
buf = (u8 *)realloc(buf, bufLen);
fread(buf, 1, bufLen, fp);
}

void Write(FILE * fp, int padSize)
{
char pad = 0;
for (int i = 0; i < padSize; ++i)
fwrite(&pad, 1, 1, fp);
fwrite(buf, 1, bufLen, fp);
}

int WriteFATHeader(FILE * fp)
{
// This writes a bare minimum boot sector to fool windows into thinking it has a valid FAT16 disk
u8 header[] =
{
// Jump
0xEB, 0x3C, 0x90,

// OEM ID
0x4D, 0x53, 0x44, 0x4F, 0x53, 0x35, 0x2E, 0x30,

// BPB and EBPB
0x00, 0x02, 0x08, 0x04, 0x00, 0x02, 0x00, 0x02, 0x00, 0x00, 0xF8, 0xEE, 0x00, 0x3F, 0x00, 0xFF,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x70, 0x07, 0x00, 0x80, 0x00, 0x29, 0x80, 0xE1, 0x4B, 0x38,
0x4E, 0x4F, 0x20, 0x4E, 0x41, 0x4D, 0x45, 0x20, 0x20, 0x20, 0x20, 0x46, 0x41, 0x54, 0x31, 0x36,
0x20, 0x20, 0x20, 0x33, 0xC9, 0x8E, 0xD1, 0xBC, 0xF0, 0x7B, 0x8E, 0xD9, 0xB8, 0x00, 0x20, 0x8E,
0xC0, 0xFC, 0xBD, 0x00, 0x7C, 0x38, 0x4E, 0x24, 0x7D
};

fwrite(header, 1, sizeof(header), fp);

for (int i = 0; i < 426; ++i) // this would be were the code for the DOS boot sector would live but we just zero it out
fputc(0, fp);

fputc(0x55, fp);
fputc(0xAA, fp);
return 512;
}

void SetBufSize( int newSize )
{
int oldLen = bufLen;
bufLen = newSize;
buf = (u8*)realloc(buf, newSize);
for (int i = oldLen; i < bufLen; ++i)
buf[i] = 0xFF;
}

void * GetBaseAddr() const
{
return buf;
}

void * GetEndAddr() const
{
return buf + bufLen;
}
};




struct Header
{
u32 jump_instruction; // one instruction jumping to real code
char magic[8]; // ="eGON.BT0" or "eGON.BT1", not C-style string.
u32 check_sum; // generated by PC
u32 length; // specified in your startup ASM
u32 pub_head_size; // the size of boot_file_head_t
u8 pub_head_vsn[4]; // the version of boot_file_head_t
u8 file_head_vsn[4]; // the version of boot0_file_head_t or boot1_file_head_t
u8 Boot_vsn[4]; // Boot version
u8 eGON_vsn[4]; // eGON version
u8 platform[8]; // platform information
};



u32 ComputeCheckSum(const Buffer & src)
{
u32 checksum = 0;
u32 * ptr = (u32 *)(src.GetBaseAddr());
u32 * end = (u32 *)(src.GetEndAddr());

while(ptr < end)
{
checksum += *(ptr++); // I'm sure there should be some bigendian / littleendian swapping going on here but it seems to work all the same. <Shrug>
}

return checksum;
}


void main( int argc, char * argv [] )
{
Buffer src;

FILE * fp = fopen(argv[1], "rb");
if (fp != nullptr)
{
src.Read(fp);
fclose(fp);
}
else
{
printf( "Unable to open source file\n" );
return;
}

Header * header = (Header *)(src.GetBaseAddr());
src.SetBufSize(header->length); // at this point, the buffer is the size of the image on disk, resize the buffer to the length value specified in the image (it could be larger).
header = (Header *)(src.GetBaseAddr()); // re-assign the header just in case the base address changed during realloc
header->check_sum = 0x5F0A6C39;
header->check_sum = ComputeCheckSum(src);


fp = fopen(argv[2], "wb");
if (fp != nullptr)
{
int size = src.WriteFATHeader(fp);
src.Write(fp, 0x2000 - size);
fclose(fp);
}
else
{
printf( "Unable to open destination file\n" );
return;
}
}


Then use the following batch file to assemble and process Goatfeeds code.

Code: [Select]
arm-none-eabi-as.exe -o out.elf main.s
arm-none-eabi-objcopy.exe -O binary out.elf  out.bin
ChecksumGen.exe out.bin out.raw

I've been using the SUSE Studio Image writer on Windows to actually send the image to the card.
« Last Edit: April 09, 2015, 04:48:23 am by Mike Kaiser »

Offline Goatfreed

  • Newbie
  • *
  • Posts: 18
  • Karma: +0/-0
    • View Profile
Re: Really Bare Metal Programming on CubieTruck
« Reply #5 on: January 26, 2015, 04:13:58 am »
Hi there, I just wanted to say that this post REALLY helped me out.
Having looked through so many posts about the A20 with little actual information on how to get something working, this post was a breath of fresh air.

That's great.
I had to work myself through the !whole! BROM disassembly, I found somewhere on GITHUB, to get this information. Really annoying.
Also thanks that you posted your Checksum Generator code (I didn't because it was a quick shot messy programm, that I was not willing to make pretty to post it here ^^.

Two question to you, if I may:
What are you using to programm on the A20 (IDE/Compiler)?

I saw, that the CPUs run initially with the 24MHz clock on A20 and i wonder how to change it to PLL1, which seems to be the fastest (if factors are set correspondingly) and intended to be used for CPU anyway.
Setting the "enable" bit in the CCU register for PLL1 and setting PLL1 as source clock for CPU in AHB/CPU... register does not seem to be enough...
Do you have any idea on that one?
I am pretty new to bare metal programming entirely and may miss some obvious (for others) points here.

Edit:
It is enough to enable PLL1 and set the CPU source. Between enabling and source setting the program should "wait" a few clocks (I use 200, as i saw this elsewhere for the A20).
My problem was, that I used the "|=" operator to set the highest bit in the CCU-PLL1 register.
For some reason this did not work. But wenn taking the value manipulating it and putting it back with "=" operator, it works (maybe some Compiler-optimization approach...).

P.S.: I am really glad to find someone else, who wants to do real bare metal on the A20, that I can exchange information with.
« Last Edit: February 09, 2015, 07:32:57 am by Goatfreed »

Offline baremetal

  • Newbie
  • *
  • Posts: 12
  • Karma: +0/-0
    • View Profile
Re: Really Bare Metal Programming on CubieTruck
« Reply #6 on: February 09, 2015, 04:45:49 am »
Thanks to both contributors for these posts. They get you started with bare metal on the Allwinner A20. Win32DiscImager clobbers Sector 0 on your SD Card. Rather than re-formatting it every time you want to put a new img file on it, you can add the following code to the top of the "Write" procedure in Mike's Checksumgen code. The code writes DOS Sector 0 data to the img file instead of pad characters in Sector 0. It fools Windows into thinking that the Card is validly formatted.

-----------------------------------
/* Write DOS Sector 0 Data to the Image File, then add the padding and the Boot Code */
int partition_data[] =    {0xEB, 0x3C, 0x90, 0x4D, 0x53 ,0x44, 0x4F, 0x53, 0x35, 0x2E, 0x30, 0x00, 0x02, 0x08, 0x04, 0x00, 0x02, 0x00, 0x02, 0x00, 0x00, 0xF8, 0xEE, 0x00, 0x3F, 0x00, 0xFF, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x70, 0x07, 0x00, 0x80, 0x00, 0x29, 0x80, 0xE1, 0x4B, 0x38, 0x4E, 0x4F, 0x20, 0x4E, 0x41, 0x4D, 0x45, 0x20, 0x20, 0x20, 0x20, 0x46, 0x41, 0x54, 0x31, 0x36, 0x20, 0x20, 0x20, 0x33, 0xC9, 0x8E, 0xD1, 0xBC, 0xF0, 0x7B, 0x8E, 0xD9, 0xB8, 0x00, 0x20, 0x8E, 0xC0, 0xFC, 0xBD, 0x00, 0x7C, 0x38, 0x4E, 0x24, 0x7D, 0x24, 0x8B, 0xC1, 0x99, 0xE8, 0x3C, 0x01, 0x72, 0x1C, 0x83, 0xEB, 0x3A, 0x66, 0xA1, 0x1C, 0x7C, 0x26, 0x66, 0x3B, 0x07, 0x26, 0x8A, 0x57, 0xFC,
0x75, 0x06, 0x80, 0xCA, 0x02, 0x88, 0x56, 0x02, 0x80, 0xC3, 0x10, 0x73, 0xEB, 0x33, 0xC9, 0x8A, 0x46, 0x10, 0x98, 0xF7, 0x66, 0x16, 0x03, 0x46, 0x1C, 0x13, 0x56, 0x1E, 0x03, 0x46, 0x0E, 0x13, 0xD1, 0x8B, 0x76, 0x11, 0x60, 0x89, 0x46, 0xFC, 0x89, 0x56, 0xFE, 0xB8, 0x20, 0x00, 0xF7, 0xE6, 0x8B, 0x5E, 0x0B, 0x03, 0xC3, 0x48, 0xF7, 0xF3, 0x01, 0x46, 0xFC, 0x11, 0x4E, 0xFE, 0x61, 0xBF, 0x00, 0x00, 0xE8, 0xE6, 0x00, 0x72, 0x39, 0x26, 0x38, 0x2D, 0x74, 0x17, 0x60, 0xB1, 0x0B, 0xBE, 0xA1, 0x7D, 0xF3, 0xA6, 0x61, 0x74, 0x32, 0x4E, 0x74, 0x09, 0x83, 0xC7, 0x20, 0x3B, 0xFB, 0x72, 0xE6, 0xEB, 0xDC, 0xA0, 0xFB, 0x7D, 0xB4, 0x7D, 0x8B, 0xF0, 0xAC, 0x98, 0x40, 0x74, 0x0C, 0x48, 0x74, 0x13, 0xB4, 0x0E, 0xBB, 0x07, 0x00, 0xCD, 0x10, 0xEB, 0xEF, 0xA0, 0xFD, 0x7D, 0xEB, 0xE6, 0xA0, 0xFC, 0x7D, 0xEB, 0xE1, 0xCD, 0x16, 0xCD, 0x19, 0x26, 0x8B, 0x55, 0x1A, 0x52, 0xB0, 0x01, 0xBB, 0x00, 0x00, 0xE8, 0x3B, 0x00, 0x72, 0xE8, 0x5B, 0x8A, 0x56, 0x24, 0xBE, 0x0B, 0x7C, 0x8B, 0xFC, 0xC7, 0x46, 0xF0, 0x3D, 0x7D, 0xC7, 0x46, 0xF4, 0x29, 0x7D, 0x8C, 0xD9, 0x89, 0x4E, 0xF2, 0x89, 0x4E, 0xF6, 0xC6, 0x06, 0x96, 0x7D, 0xCB, 0xEA, 0x03, 0x00, 0x00, 0x20, 0x0F, 0xB6, 0xC8, 0x66, 0x8B, 0x46, 0xF8, 0x66, 0x03, 0x46, 0x1C, 0x66, 0x8B, 0xD0, 0x66, 0xC1, 0xEA, 0x10, 0xEB, 0x5E, 0x0F, 0xB6, 0xC8, 0x4A, 0x4A, 0x8A, 0x46, 0x0D, 0x32, 0xE4, 0xF7, 0xE2, 0x03, 0x46, 0xFC, 0x13, 0x56, 0xFE, 0xEB, 0x4A, 0x52, 0x50, 0x06, 0x53, 0x6A, 0x01, 0x6A, 0x10, 0x91, 0x8B, 0x46, 0x18, 0x96, 0x92, 0x33, 0xD2, 0xF7, 0xF6, 0x91, 0xF7, 0xF6, 0x42, 0x87, 0xCA, 0xF7, 0x76, 0x1A, 0x8A, 0xF2, 0x8A, 0xE8, 0xC0, 0xCC, 0x02, 0x0A, 0xCC, 0xB8, 0x01, 0x02, 0x80, 0x7E, 0x02, 0x0E, 0x75, 0x04, 0xB4, 0x42, 0x8B, 0xF4, 0x8A, 0x56, 0x24, 0xCD, 0x13, 0x61, 0x61, 0x72, 0x0B, 0x40, 0x75, 0x01, 0x42, 0x03, 0x5E, 0x0B, 0x49, 0x75, 0x06, 0xF8, 0xC3, 0x41, 0xBB, 0x00, 0x00, 0x60, 0x66, 0x6A, 0x00, 0xEB, 0xB0, 0x42, 0x4F, 0x4F, 0x54, 0x4D, 0x47, 0x52, 0x20, 0x20, 0x20, 0x20, 0x0D, 0x0A, 0x52, 0x65, 0x6D, 0x6F, 0x76, 0x65, 0x20, 0x64, 0x69, 0x73, 0x6B, 0x73, 0x20, 0x6F, 0x72, 0x20, 0x6F, 0x74, 0x68, 0x65, 0x72, 0x20, 0x6D, 0x65, 0x64, 0x69, 0x61, 0x2E, 0xFF, 0x0D, 0x0A, 0x44, 0x69, 0x73, 0x6B, 0x20, 0x65, 0x72, 0x72, 0x6F, 0x72, 0xFF, 0x0D, 0x0A, 0x50, 0x72, 0x65, 0x73, 0x73, 0x20, 0x61, 0x6E, 0x79, 0x20, 0x6B, 0x65, 0x79, 0x20, 0x74, 0x6F, 0x20, 0x72, 0x65, 0x73, 0x74, 0x61, 0x72, 0x74, 0x0D, 0x0A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xAC, 0xCB, 0xD8, 0x55, 0xAA}; /* define sector 0 data */

for (int i = 0; i < 512; ++i) fwrite(&partition_data, 1, 1, fp);   /* write sector 0 data to the output file */
padSize = padSize - 512; /* reduce padsize by the number of characters that we just wrote */
-----------------------------------


Offline Goatfreed

  • Newbie
  • *
  • Posts: 18
  • Karma: +0/-0
    • View Profile
Re: Really Bare Metal Programming on CubieTruck
« Reply #7 on: March 13, 2015, 02:55:40 am »
I am not sure if I should start a new thread, so I just use this one:
Question is about bare metal DMA usage.
I was able to initialize and write/read from/to DRAM.
Also my SPI (in my case SPI2, which is connected to Extension pins on Cubietruck) is working.
As I use my SPI for a Display (http://www.exp-tech.de/displays/tft/adafruit-2-8-tft-lcd-with-touchscreen-breakout-board-w-microsd-socket), I need fast data transfer and there I wanted to go with DMA.

So I thought, N(ormal)DMA should be quite simple... Also a look in the A20 manual suggests that.
I can gate (AHB-Gate) the DMA and write to its registers. However it does not start processing.
And I don't know what the error should be.
As test I also tried to simply copy a few bytes from DRAM to DRAM, which also did not work.
I am rather sure, that i set the start/destination addresses correctly (as described in the A20 manual, no remapping took place) and the configuration register correctly (Also i did not forget the Burst Count register).
But the bit 31 in the config register just stays to be 1. That is the "load bit" which afaik should start the process und be cleared automatically, when the NDMA has finished.
I tried Channel0 as well as Channel1 (since I noticed, that BROM uses Channel1 and Channel2 for SPI0(JTAG?), not Channel0).

Edit: I tried the D(edicated)DMA as well with the exact same result.

Am I missing a step here?
Do I need to somehow explicitly change the DRAM Controller to some kind of DMA-Mode? Or do I maybe need to set something to enable the Bus for DMA instead of CPU?

Regards,
Thomas

Offline baremetal

  • Newbie
  • *
  • Posts: 12
  • Karma: +0/-0
    • View Profile
Re: Really Bare Metal Programming on CubieTruck
« Reply #8 on: March 16, 2015, 08:43:54 pm »
Hi Thomas

Apologies for not responding earlier. I thought this thread was dead.

I have been doing my bare metal development on a Banana Pi Board as opposed to a Cubie Board, but it has the same A20 chip on it.

I will see if I can assist with your issue and revert if I come up with anything.

Cheers.

Offline baremetal

  • Newbie
  • *
  • Posts: 12
  • Karma: +0/-0
    • View Profile
Re: Really Bare Metal Programming on CubieTruck
« Reply #9 on: March 16, 2015, 10:42:10 pm »
I have set out below some C++ code that can be used instead of Mike's code above.

It is not as cryptic as Mike's code so is more suitable for people who struggle to follow Mike's code (no disrespect to Mike by the way as his code works fine). The trade off on readability is that it is a bit longer.

It has the SD Card Sector Write logic in it so that you don't have to keep re-formatting your SD Card when you write the image to it.

It allows you to have startup code as well as Main Code. The Main Code starts at 0x8000 on the SD Card. You will need to write code in your startup code to read the Main Code off the Card and put it wherever you want in memory). I have added this functionality because if you use the startup code method set out in these posts you are limited to a code size of 4K (as that is all that the A20 Bootloader copies from the SD Card).

You run the compiled exe from the command line supplying 3 arguments (name of startup binary file, name of main binary file and name of destination img file). For example: CreateCard.exe startup.bin main.bin cardfile.img

-----------------------------------------------------------------

#include "stdafx.h"
#include <fstream>

// declare global variables
unsigned int boot_code_length;
unsigned int buffer_length;
unsigned int main_code_length;
unsigned char * boot_code_buffer;
unsigned char * main_code_buffer;

// declare functions
void read_boot_binary(FILE * fp);
void read_main_binary(FILE * fp);
void write_card_image(FILE * pTargetFile);
   
int main( int argc, char * argv [] )
{
   // Open the Source Files
   FILE * pFile;
   if (fopen_s(&pFile, argv[1], "rb") != 0)
   {printf( "Boot File Open Error\n" ); return -1;}
   read_boot_binary(pFile);
   fclose(pFile);

   if (fopen_s(&pFile, argv[2], "rb") != 0)
   {printf( "Main File Open Error\n" ); return -2;}
   read_main_binary(pFile);
   fclose(pFile);

   // Insert initial checksum value (0x5F0A6C39) into Boot Code Buffer
   boot_code_buffer[12] = 0x39; boot_code_buffer[13] = 0x6C;
   boot_code_buffer[14] = 0x0A; boot_code_buffer[15] = 0x5F;
   // Calculate Checksum
   unsigned int i = 0, word_value = 0, checksum = 0;
   while (i < buffer_length)
   {
      // read 4 bytes and create the Word value that the 4 bytes represent
      word_value = boot_code_buffer + (boot_code_buffer[i+1]<<8) + (boot_code_buffer[i+2]<<16) + (boot_code_buffer[i+3]<<24);
      // add the Word value to the checksum
      checksum = checksum + word_value;
      // increment one Word (ie 4 bytes)
      i = i + 4;   
   }
   // Insert calculated checksum value into Boot Code Buffer
   boot_code_buffer[12] = (checksum << 24) >> 24; boot_code_buffer[13] = (checksum << 16) >> 24;
   boot_code_buffer[14] = (checksum << 8) >> 24; boot_code_buffer[15] = (checksum >> 24);

   // Wrtite the Card Image File
   if (fopen_s(&pFile, argv[3], "wb") != 0)
   {printf( "Target File Open Error\n" ); return -3;}
   write_card_image(pFile);
   // Close the File
   fclose(pFile);

   printf("Card Image File Written\n");
   
   return 0;
}
void read_boot_binary(FILE * fp)
{
   size_t result;
   // Calculate the file size
   fseek (fp , 0 , SEEK_END);
   boot_code_length = ftell (fp);
   buffer_length = boot_code_length + (24576 - boot_code_length);
   rewind (fp);
   // Allocate a Memory Buffer to hold the contents of the file plus additional buffer
   // characters to get to a length of 0x6000 (24576 decimal)
   boot_code_buffer = (unsigned char*) malloc (sizeof(unsigned char)*buffer_length);
   if (boot_code_buffer == NULL) {printf( "Boot File Memory Allocation Error\n" ); return;}
   // Copy the file into the Memory Buffer
   result = fread (boot_code_buffer,1,boot_code_length,fp);
   if (result != boot_code_length) {printf( "Boot File Read Error\n" );}
   // Add buffer characters from end of code to end of buffer (use 0xFF as buffer character)
   for (unsigned int i = boot_code_length; i < buffer_length; i++) boot_code_buffer = 0xFF;
}

void read_main_binary(FILE * fp)
{
   size_t result;
   // Calculate the file size
   fseek (fp , 0 , SEEK_END);
   main_code_length = ftell (fp);
   rewind (fp);
   // Allocate a Memory Buffer to hold the contents of the file
   main_code_buffer = (unsigned char*) malloc (sizeof(unsigned char)*main_code_length);
   if (main_code_buffer == NULL) {printf( "Main File Memory Allocation Error\n" ); fclose(fp); return;}
   // Copy the file into the Memory Buffer
   result = fread (main_code_buffer,1,main_code_length,fp);
   if (result != main_code_length) {printf( "Main File Read Error\n" );}
}   

void write_card_image(FILE * pTargetFile)
{
   /* Write DOS Partition Data so that Card appears formatted */
   int partition_data[] =
   {0xEB, 0x3C, 0x90, 0x4D, 0x53 ,0x44, 0x4F, 0x53, 0x35, 0x2E, 0x30, 0x00, 0x02, 0x08, 0x04, 0x00, 0x02, 0x00, 0x02, 0x00, 0x00, 0xF8, 0xEE, 0x00, 0x3F, 0x00, 0xFF, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x70, 0x07, 0x00,
   0x80, 0x00, 0x29, 0x80, 0xE1, 0x4B, 0x38, 0x4E, 0x4F, 0x20, 0x4E, 0x41, 0x4D, 0x45, 0x20, 0x20, 0x20, 0x20, 0x46, 0x41, 0x54, 0x31, 0x36, 0x20, 0x20, 0x20, 0x33, 0xC9, 0x8E, 0xD1, 0xBC, 0xF0, 0x7B, 0x8E, 0xD9, 0xB8,
   0x00, 0x20, 0x8E, 0xC0, 0xFC, 0xBD, 0x00, 0x7C, 0x38, 0x4E, 0x24, 0x7D, 0x24, 0x8B, 0xC1, 0x99, 0xE8, 0x3C, 0x01, 0x72, 0x1C, 0x83, 0xEB, 0x3A, 0x66, 0xA1, 0x1C, 0x7C, 0x26, 0x66, 0x3B, 0x07, 0x26, 0x8A, 0x57, 0xFC,
   0x75, 0x06, 0x80, 0xCA, 0x02, 0x88, 0x56, 0x02, 0x80, 0xC3, 0x10, 0x73, 0xEB, 0x33, 0xC9, 0x8A, 0x46, 0x10, 0x98, 0xF7, 0x66, 0x16, 0x03, 0x46, 0x1C, 0x13, 0x56, 0x1E, 0x03, 0x46, 0x0E, 0x13, 0xD1, 0x8B, 0x76, 0x11,
   0x60, 0x89, 0x46, 0xFC, 0x89, 0x56, 0xFE, 0xB8, 0x20, 0x00, 0xF7, 0xE6, 0x8B, 0x5E, 0x0B, 0x03, 0xC3, 0x48, 0xF7, 0xF3, 0x01, 0x46, 0xFC, 0x11, 0x4E, 0xFE, 0x61, 0xBF, 0x00, 0x00, 0xE8, 0xE6, 0x00, 0x72, 0x39, 0x26,
   0x38, 0x2D, 0x74, 0x17, 0x60, 0xB1, 0x0B, 0xBE, 0xA1, 0x7D, 0xF3, 0xA6, 0x61, 0x74, 0x32, 0x4E, 0x74, 0x09, 0x83, 0xC7, 0x20, 0x3B, 0xFB, 0x72, 0xE6, 0xEB, 0xDC, 0xA0, 0xFB,
   0x7D, 0xB4, 0x7D, 0x8B, 0xF0, 0xAC, 0x98, 0x40, 0x74, 0x0C, 0x48, 0x74, 0x13, 0xB4, 0x0E, 0xBB, 0x07, 0x00, 0xCD, 0x10, 0xEB, 0xEF, 0xA0, 0xFD, 0x7D, 0xEB, 0xE6, 0xA0, 0xFC, 0x7D, 0xEB, 0xE1, 0xCD, 0x16, 0xCD, 0x19,
   0x26, 0x8B, 0x55, 0x1A, 0x52, 0xB0, 0x01, 0xBB, 0x00, 0x00, 0xE8, 0x3B, 0x00, 0x72, 0xE8, 0x5B, 0x8A, 0x56, 0x24, 0xBE, 0x0B, 0x7C, 0x8B, 0xFC, 0xC7, 0x46, 0xF0, 0x3D, 0x7D, 0xC7, 0x46, 0xF4, 0x29, 0x7D, 0x8C, 0xD9,
   0x89, 0x4E, 0xF2, 0x89, 0x4E, 0xF6, 0xC6, 0x06, 0x96, 0x7D, 0xCB, 0xEA, 0x03, 0x00, 0x00, 0x20, 0x0F, 0xB6, 0xC8, 0x66, 0x8B, 0x46, 0xF8, 0x66, 0x03, 0x46, 0x1C, 0x66, 0x8B, 0xD0, 0x66, 0xC1, 0xEA, 0x10, 0xEB, 0x5E,
   0x0F, 0xB6, 0xC8, 0x4A, 0x4A, 0x8A, 0x46, 0x0D, 0x32, 0xE4, 0xF7, 0xE2, 0x03, 0x46, 0xFC, 0x13, 0x56, 0xFE, 0xEB, 0x4A, 0x52, 0x50, 0x06, 0x53, 0x6A, 0x01, 0x6A, 0x10, 0x91, 0x8B, 0x46, 0x18, 0x96, 0x92, 0x33, 0xD2,
   0xF7, 0xF6, 0x91, 0xF7, 0xF6, 0x42, 0x87, 0xCA, 0xF7, 0x76, 0x1A, 0x8A, 0xF2, 0x8A, 0xE8, 0xC0, 0xCC, 0x02, 0x0A, 0xCC, 0xB8, 0x01, 0x02, 0x80, 0x7E, 0x02, 0x0E, 0x75, 0x04, 0xB4, 0x42, 0x8B, 0xF4, 0x8A, 0x56, 0x24,
   0xCD, 0x13, 0x61, 0x61, 0x72, 0x0B, 0x40, 0x75, 0x01, 0x42, 0x03, 0x5E, 0x0B, 0x49, 0x75, 0x06, 0xF8, 0xC3, 0x41, 0xBB, 0x00, 0x00, 0x60, 0x66, 0x6A, 0x00, 0xEB, 0xB0, 0x42, 0x4F, 0x4F, 0x54, 0x4D, 0x47, 0x52, 0x20,
   0x20, 0x20, 0x20, 0x0D, 0x0A, 0x52, 0x65, 0x6D, 0x6F, 0x76, 0x65, 0x20, 0x64, 0x69, 0x73, 0x6B, 0x73, 0x20, 0x6F, 0x72, 0x20, 0x6F, 0x74, 0x68, 0x65, 0x72, 0x20, 0x6D, 0x65, 0x64, 0x69, 0x61, 0x2E, 0xFF, 0x0D, 0x0A,
   0x44, 0x69, 0x73, 0x6B, 0x20, 0x65, 0x72, 0x72, 0x6F, 0x72, 0xFF, 0x0D, 0x0A, 0x50, 0x72, 0x65, 0x73, 0x73, 0x20, 0x61, 0x6E, 0x79, 0x20, 0x6B, 0x65, 0x79, 0x20, 0x74, 0x6F, 0x20, 0x72, 0x65, 0x73, 0x74, 0x61, 0x72,
   0x74, 0x0D, 0x0A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xAC, 0xCB, 0xD8, 0x55, 0xAA};
   for (int i = 0; i < 512; ++i) fwrite(&partition_data, 1, 1, pTargetFile);
   
   unsigned char pad = 0;                                          
   
   /* Write 7680 pad characters (8192 minus 512 DOS Partition Data characters) so that boot code starts at 0x2000 */
   for (int i = 0; i < 7680; ++i) fwrite(&pad, 1, 1, pTargetFile);   
   /* Write the boot code to the output file. */
   fwrite(boot_code_buffer, 1, boot_code_length, pTargetFile);         

   pad = 0xFF;
   /* Write pad characters so that main code starts at 0x8000) */
   int gap_value = 24576 - boot_code_length;
   for (int i = 0; i < gap_value; ++i) fwrite(&pad, 1, 1, pTargetFile);
   
   /* Write the main code to the output file. */
   fwrite(main_code_buffer, 1, main_code_length, pTargetFile);
}
-----------------------------------------------------------------

Offline Goatfreed

  • Newbie
  • *
  • Posts: 18
  • Karma: +0/-0
    • View Profile
Re: Really Bare Metal Programming on CubieTruck
« Reply #10 on: March 17, 2015, 02:18:37 am »
Thanks baremetal.

But I now think that I did not make my question clear enough.
I do not have Problems with  booting form an SD-Card.
That works fine for me.
Also I did understand, that the bootloader only takes a few kB and I need to do SD-to-DRAM copying of my "main-code" therefore.
The Problem could be seen as the copying (although copying form SD to DRAM is not even my problem at the moment, the solution would be quite the same for this).

My Problem is, that I wanted to use the DMA for data transfer.
I did the above mentioned steps to set the DMA controller:
I can gate (AHB-Gate) the DMA and write to its registers. However it does not start processing.
And I don't know what the error should be.
As test I also tried to simply copy a few bytes from DRAM to DRAM, which also did not work.
I am rather sure, that i set the start/destination addresses correctly (as described in the A20 manual, no remapping took place) and the configuration register correctly (Also i did not forget the Burst Count register).
But the bit 31 in the config register just stays to be 1. That is the "load bit" which afaik should start the process und be cleared automatically, when the NDMA has finished.
I tried Channel0 as well as Channel1 (since I noticed, that BROM uses Channel1 and Channel2 for SPI0(JTAG?), not Channel0).

Edit: I tried the D(edicated)DMA as well with the exact same result.

But the DMA did nothing.
For the copying from SD to DRAM the DMA would be very usful but not that necessary for me. But for my Display interface I want to hold a buffer in DRAM, which I transfer via SPI to the display.
When I do that using the CPU (and Hardware SPI, SPI2 in the case of Cubietruck) it works but too slow.

Hopefully I could make my concern a bit more clear and, even more important, you can help me out on this one as well :)

Regards,
Thomas

Offline baremetal

  • Newbie
  • *
  • Posts: 12
  • Karma: +0/-0
    • View Profile
Re: Really Bare Metal Programming on CubieTruck
« Reply #11 on: March 18, 2015, 07:17:22 pm »
Hi Thomas

The code that I posted did not relate to your DMA issue. I put the code on for anyone who wanted to have startup code as well as main code and to enable them to better understand how to create useful SD Card images.

On your DMA issue I will take a look at it this weekend.

Could you please let me know whether you are trying to use Standard DMA or Dedicated DMA?

Note that if you are trying to use Dedicated DMA, then you cannot transfer from DRAM to DRAM.

Offline Goatfreed

  • Newbie
  • *
  • Posts: 18
  • Karma: +0/-0
    • View Profile
Re: Really Bare Metal Programming on CubieTruck
« Reply #12 on: March 19, 2015, 03:33:19 am »
Hi baremetal,

sorry then, I missinterpreted that.
I read in the manual that the DDMA is only able to transfer between DRAM and other modules. I tried to used for DRAM->SPI transfer.
I also tried to use NDMA for DRAM->SPI. As this did not work I tried DRAM->DRAM using NDMA.
In principle I don't care which one i use. My guess would be that DDMA would be more, well, dedicated for such tasks ^^, although I don't know the technical differences (yet! I will inform myself).

Thank you in advance for your efforts to help me. I could not ask for more.

Offline baremetal

  • Newbie
  • *
  • Posts: 12
  • Karma: +0/-0
    • View Profile
Re: Really Bare Metal Programming on CubieTruck
« Reply #13 on: March 21, 2015, 04:53:38 pm »
Hi Thomas

I set up a single channel single byte transfer on my banana pi board this morning and it worked.

The code reads the byte at 0x00000000 (SRAM) and writes it  to 0x00000010 (SRAM).

NOTE: I did not put a timeout check in the transfer complete code so if your transfer fails, the system will hang.

Let me know how you go.

/* Turn AHB Gating for DMA on */
ldr r1, =0x01C20060
ldr r0, [r1]
orr r0, r0, 0x40
str r0, [r1]

/* Disable DMA IRQ Interrupts */
mov r0, 0x00
ldr r1, =0x01C02000
str r0, [r1]

/* Disable NDMA Auto Gating */
mov r0, 0x00
ldr r1, =0x01C02008
str r0, [r1]

/* Set r1 to the Base Address for Channel 0 */
ldr r1, =0x01C02100

/* Set Source Address */
ldr r0, =0x00000000
str r0, [r1, #0x04]

/* Set Destination Address */
ldr r0, =0x00000010
str r0, [r1, #0x08]

/* Set Byte Count */
mov r0, 0x01
str r0, [r1, #0x0C]

/* Configure and Start DMA Transfer */
ldr r0, =0x80750075
str r0, [r1]

/* Wait until Transfer Completes (Bit 31 is zero) */
check_DMA_busy:
    ldr r0, [r1]
    ands r0, r0, #0x80000000
    bne check_DMA_busy

[if you get to here, the transfer should have completed]




Offline Goatfreed

  • Newbie
  • *
  • Posts: 18
  • Karma: +0/-0
    • View Profile
Re: Really Bare Metal Programming on CubieTruck
« Reply #14 on: March 23, 2015, 04:15:09 pm »
Hi baremetal,

your example did work. Copied 1 byte from SRAM to SRAM.
However, when I change it to copy from SDRAM to SDRAM it does not do anything.
I only modified the Source address, Destination address and configuration:
Source Address = 0x40000000
Destination Address = 0x40000010
Configuration = 0x80760076

however long I wait, the Load bit remains set in the configuration register.
So the DMA did not do anything or did even fail.
Maybe I would need to set up the DRAM Controller differently, but i do not see anything there (linux-sunxi wiki page for the DRAMC), which looks as if it would have an effect here.
On the other hand I did not manage to get a transfer from SRAM to SPI2 with:
Source Address = 0x00000000
Destination Address = 0x01C17004 (SPI2 TX)
Configuration = 0x807A0075
Byte Count = 32 (also tried with 1, no change)

While I can write "manually" though SPI2 (I have an LCD connected and it does what it should, when i am using SPI directly).
So I am most definitivly missing something...