When you take a look at the memblock allocation API for the first time, you might feel a bit overwhelmed. You see a bunch of functions with “memblock_alloc” or “memblock_phys_alloc” prefixes, coupled together with all possible combinations of start/end addresses and NUMA node requirements that you can think of. But this is not a surprising view, given the fact that memblock has to accommodate the needs of ~20 different architectures, from Motorola 68000 through PowerPC to x86 and ARM.
It’s possible to look into the abyss of allocs and understand what are they about. As we will see later on, both groups of
memblock_phys_alloc* use the same logic to find free memory block. Although this
logic is a bit complicated, it makes sense and can be broken down into smaller, easy to digest, parts. After learning the way how
it works, we can apply this knowledge to all functions. We would just need to remember what memory constraints we can ask for
There’s a lot to take in, so we need to approach this topic with caution. Descending deep down into macros like
__for_each_mem_range_rev straight away won’t teach us much about the flow of the allocation process. We have to be able to zoom in and out
when needed. Because of this, I decided to break this analysis down into two parts. In the first one, I’d like to
show you how
memblock_phys_alloc* functions converge and discuss the general flow of this
common part. Once we understand how it works, we can move on to the second part. There, we’ll look at the implementation
of the core allocation function and see what it does to grant memory at (almost) all costs. We’ll dissect its internal
parts, delve into details, and get to see quite a lot of helper functions.
Let’s start our analysis with taking a look at two functions -
memblock_phys_alloc_range. As you may
recall from the introductory post on memblock, the first one represents the
group of functions that return virtual addresses to the allocated memory regions, whereas the second one operates on physical
One of the most widely used boot time memory allocation functions.
memblock_alloc is used across different architectures to
allocate memory for:
- OS data structures like MMU context managers, PCI controllers and resources
- String literals, like resource names
- Error info logs
- and more
A quick glance at its implementation reveals a quite short function – it’s a wrap for a call
The third and fourth parameter instruct memblock to use the whole address range covered by the entries of the
doesn’t care about which NUMA node will be used here. To signal this,
NUMA_NO_NODE value is passed in.
Going one step deeper doesn’t reveal much of
memblock_alloc_try_nid is yet another wrapper
memblock_alloc_internal and clears the memory pointed by
ptr if one was allocated:
Maybe third time’s a charm? Well,
memblock_alloc_internal turns out to be yet another wrapper. But it’s a more elaborated one:
In ①, we check if
memblock_alloc and similar wrappers were called when the slab allocator was available. If this is the case, the function uses slab allocation instead of the memblock one and returns.
Moving on, ② checks if the requested maximum address is above the memblock’s memory limit, and caps it if needed. With all of
this from the way,
the core allocation function, is called. This function returns the base physical
address of a memory block on success. In the next step ③,
checks if the allocation was successful. If this wasn’t the case, and we had a minimal address specified
(e.g. was passed via
memblock_alloc_from helper), it calls
memblock_alloc_range_nid without the lower address requirement in ④. The check in ⑤ makes sure that we got a valid memory
location before trying to map it to a virtual address (i.e. a void pointer) in
phys_to_virt. Given the
memblock_alloc_internal returns a virtual memory address to an allocated memory chunk, returned by
memblock_alloc at the end of the call chain.
You’ve might have expected to see
discussed here. Its name contrasts well with
memblock_alloc name, the
function itself has a simple signature, and it looks convenient to use. It turns out that it’s the least popular function in
the phys_alloc group. It’s not that interesting anyway, it’s just a wrapper for
Actually, this one is the most widely used when working with physical addresses.
function is used to allocate memory for:
As you can see, x86 makes an extensive use of
memblock_phys_alloc* functions. Interestingly enough, the word has it that
this family of functions is getting deprecated (source).
Coming back to our function, this is how
memblock_phys_alloc_range looks like:
That’s correct, we’ve discovered yet another wrapper. But, wait a second. We saw
memblock_alloc_range_nid function before,
didn’t we? Indeed, we saw it just a minute ago in
memblock_alloc_internal. This is the common part I’ve mentioned in the
beginning of this post. In fact, the allocation functions that return physical addresses just wrap a call to
memblock_alloc_range_nid with specific parameters. Here, we have no preference for NUMA node, and pass the same value as we did
To sum this up in a visual way, the call chains we’ve followed here can be seen this way:
The same pattern emerges for other allocation functions like
memblock_alloc_low. It turns out that
memblock_alloc_range_nid is called, both directly and indirectly, by a group of functions, where some of them do additional
checks or clean-ups. You don’t need to take my word for it – feel free to take a look at the memblock source code to convince
yourself that this is the case.
The big picture
Now, we can agree that all roads lead to
memblock_alloc_range_nid. One more thing you need to know about it is that this
function is very determined to grant memory. It is ready to drop memory requirements passed by the programmer and retry multiple
times to allocate memory before giving up.
So, what does the general flow of this function look like? In essence,
memblock_alloc_range_nid does two things – it first
looks for an available memory region , and then tries to reserve it. Searching for a free memory block is broken down into a
multiple helpers and macros, each of them implemented for both memory allocation directions – top-down and bottom-up. The former
starts allocating from high addresses (that is, from the end of the memory), whereas the latter looks at the low addresses first.
Although bottom up feels more intuitive to think about, most architecture use the top-down direction.
Finding an available memory spot is aided by a low-level helper, which returns the base address of the region on success. If
everything goes well,
memblock_alloc_range_nid reserves a region and a new entry in
memblock.reserved is created. If not, the
function checks if it can loosen up the memory requirements, like a preference for a specific NUMA node, and tries to allocate memory
again. This approach works in most cases, so a free region is found and reserved on the next attempt. If we’re out of luck, no
memory gets allocated, and the function returns
0 address to indicate the failure. This process can be pictured as the
That’s it for the high-level tour of memblock allocs. Now, you can see how all the functions we thought were doing different
things are actually pretty close to one another. We also discussed the flow of the core allocation function,
memblock_alloc_range_nid, but many questions remain unanswered. What exact memory constraints can be scrapped when memblock
retries allocation? How are memory regions iterated in the top-down direction? Or do we care about memory mirroring at all? To
answer these, we have to take a close look at the implementation of
memblock_alloc_range_nid and analyse what bits it uses down
the way. As it’s a lot of work, let’s save it for the part 2 of our investigation.
 - it's a catch-all names for functions in one group, read it as a regular expression ⤴ - many architectures free the memblock data structure in
mm_init, which is also where the slab allocator gets initialized. This check is to protect a programmer from using a non-existing, at this point, memblock allocator ⤴