Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

How C array sizes become part of the binary interface of a library

May 6, 2019
Florian Weimer
Related topics:
C, C#, C++

Share:

    Most C compilers allow accessing an array declared extern, which has indeterminate bounds, like this:

    extern int external_array[];
    
    int
    array_get (long int index)
    {
      return external_array[index];
    }
    

    The definition of external_array could reside in a different translation unit and look like this:

    int external_array[3] = { 1, 2, 3 };
    

    The question is what happens if this separate definition is changed to this:

    int external_array[4] = { 1, 2, 3, 4 };
    

    Or this:

    int external_array[2] = { 1, 2 };
    

    Does either change preserve the binary interface (assuming that there is a mechanism that allows the application to determine the size of the array at run time)?

    Curiously, the answer is that on many architectures, increasing the array size breaks binary interface (ABI) compatibility. Decreasing the array size may also cause compatibility problems. We'll look more closely at ABI compatibility in this article and explain how to avoid problems.

    How the data section of an executable is linked

    To understand how the array size becomes part of the binary interface, we first need to examine how the data section of an executable is linked. The details are of course architecture-specific, and here we focus on the x86-64 architecture.

    The x86-64 architecture supports addressing relative to the program counter, which means that access to a global array variable, as in the array_get function shown previously, can be compiled to a single movl instruction:

    array_get:
    	movl	external_array(,%rdi,4), %eax
    	ret
    

    From that, the assembler produces an object file in which the instruction is marked with an R_X86_64_32S relocation.

    0000000000000000 :
       0:	mov    0x0(,%rdi,4),%eax
    			3: R_X86_64_32S	external_array
       7:	retq   
    

    This relocation tells the link editor (ld) to fill in the appropriate location of the external_array variable at link time, when producing an executable.

    This has two important consequences.

    • Because the variable offset is determined at link time, there is no run-time overhead for determining it. The only cost is the memory access itself.
    • To determine the offset, the sizes of all data variables need to be known. Otherwise, it would not be possible to compute the layout of the data section at link time.

    For C implementations targeting the Executable and Link Format (ELF), as used on GNU/Linux, references to extern variables do not carry object sizes. For the array_get example, the size of the object is not even known to the compiler. In fact, the entire assembler file looks like this (only omitting unwind information using -fno-asynchronous-unwind-tables, which is technically required for psABI compliance):

    	.file	"get.c"
    	.text
    	.p2align 4,,15
    	.globl	array_get
    	.type	array_get, @function
    array_get:
    	movl	external_array(,%rdi,4), %eax
    	ret
    	.size	array_get, .-array_get
    	.ident	"GCC: (GNU) 8.3.1 20190223 (Red Hat 8.3.1-2)"
    	.section	.note.GNU-stack,"",@progbits
    

    There is no size information for the external_array at all in this assembler file: The only reference to the symbol is on the line with the movl instruction, and the only numeric data in the instruction is the array element size (implied by movl and the scaling factor 4).

    If ELF required symbol sizes for undefined variables, it would not even be possible to compile the array_get function.

    How does the link editor obtain the actual symbol size? It looks at the symbol definition and uses the size information it finds there. This allows the link editor to compute the data section layout and fill out the data relocations with the appropriate offsets.

    Introducing ELF shared objects

    C implementations for ELF do not require the programmer to add source code markup to indicate if a function or variable is located in the current object (which can be a library or the main executable) or in a different object. The link editor and the dynamic loader are expected to take care of that transparently, without help for the programmer.

    At the same time, for executables, there was a desire not to reduce performance by changing the compilation model for executables. This means that when compiling source code for a main program (i.e., without -fPIC, and in this particular case, without -fPIE as well), the array_get function is compiled to the exact same instruction sequence, before the introduction of dynamic shared objects. Furthermore, it does not matter whether the external_array variable is defined in the main executable itself, or whether some shared object is loaded separately at run time. The instructions produced by the compiler are the same in both cases.

    How is this possible? After all, ELF shared objects are position-independent. They are loaded at unpredictable, randomized addresses at run time. Yet the compiler generates a machine code sequence that requires that these variables are placed at a fixed offset computed at link time, long before the program even runs.

    The answer is related to the fact that only one loaded object (the main executable) uses these fixed offsets. All other objects (the dynamic loader itself, the C run-time library, and any other library the program uses) are compiled and linked as fully position-independent (PIC) objects. For such objects, the compiler introduces an additional indirection, loading the actual address of each variable from the global offset table (GOT). We can see this indirection if we compile the array_get example with -fPIC, leading to this assembler code:

    array_get:
    	movq	external_array@GOTPCREL(%rip), %rax
    	movl	(%rax,%rdi,4), %eax
    	ret
    

    As a result, the address of the external_array variable is no longer hard-coded and can be changed at run time by initializing its GOT entry accordingly. This means that at run time, the definition of external_array can be contained in the same shared object, a different shared object, or the main program. The dynamic loader will find the appropriate definition based on the ELF symbol lookup rules and bind the undefined symbol reference to its definition, by updating the GOT entry to its actual address.

    Let's go back to the original example, where the array_get function is located in the main program, so there is no indirection for the variable address. The key idea implemented in the link editor is that the main program will provide a definition of the external_array variable even if it is actually defined in a shared object at run time. At run time, instead of pointing all shared objects to the original definition of the variable in the shared object containing it, the dynamic loader will instead pick a copy of the variable in the data section of the executable.

    This has two important consequences. First of all, recall that the definition of external_array looks like this:

    int external_array[3] = { 1, 2, 3 };
    

    The definition has an initializer, and this initializer has to be applied to the definition in the main executable. To facilitate this, the main executable contains a copy relocation for the symbol. The readelf -rW command displays it as a R_X86_64_COPY relocation:

    Relocation section '.rela.dyn' at offset 0x408 contains 3 entries:
        Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
    0000000000403ff0  0000000100000006 R_X86_64_GLOB_DAT      0000000000000000 __libc_start_main@GLIBC_2.2.5 + 0
    0000000000403ff8  0000000200000006 R_X86_64_GLOB_DAT      0000000000000000 __gmon_start__ + 0
    0000000000404020  0000000300000005 R_X86_64_COPY          0000000000404020 external_array + 0
    

    Like other relocations, a copy relocation is processed by the dynamic loader. It involves a simple, bit-wise copy operation. The target of the copy is determined by the relocation offset (0000000000404020 in the example). The source is determined at run time, based on the symbol name (external_array) and its resolution (its value). When making the copy, the dynamic loader will also look at the size of the symbol, to obtain the number of bytes that need to be copied. To make all this possible, external_array symbol is automatically exported from the executable, as a defined symbol, so that it is visible at run time to the dynamic loader. The dynamic symbol table (.dynsym) reflects this, as shown by the readelf -sW command:

    Symbol table '.dynsym' contains 4 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
         0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
         1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.2.5 (2)
         2: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
         3: 0000000000404020    12 OBJECT  GLOBAL DEFAULT   22 external_array
    

    Where does the size information come from (12 bytes, in this example)? The link editor opens all shared objects on the link command line, searches for the definition, and uses its size information found there. As before, this allows the link editor to compute the layout of the data section, so that fixed offsets can be used. Also as before, the size of the definition in the main executable is fixed and cannot change at run time.

    The dynamic linker also redirects symbol references in shared objects to the target of the copy relocation, in the main executable. This ensures that only a single copy of the variable exists in the entire program, as required by the C language semantics. Otherwise, if the variable is modified after initialization, updates from the main executable would not be visible to the dynamic shared objects, and vice versa.

    The impact on binary compatibility

    What happens if we change the definition of external_array in the shared object, without relinking (or recompiling) the main program? First, let us consider the addition of an array element.

    int external_array[4] = { 1, 2, 3, 4 };
    

    This triggers a warning at run time, from the dynamic loader:

    main-program: Symbol `external_array' has different size in shared object, consider re-linking
    

    The main program still contains a definition of external_array, which only provides space for 12 bytes. This means that the copy is incomplete: only the first three array elements are copied. Access to the array element extern_array[3] is undefined as a result. This approach affects all code in the process, not just the main program, because all references to extern_array have been redirected to the definition in the main program. This includes the shared object, which provides the definition of extern_array, which is probably not prepared to deal with the situation that the array element in its own definition is gone.

    What about the change in the opposite direction, removing an element, like this?

    int external_array[2] = { 1, 2, 3 };
    

    If the program avoids accessing the array element extern_array[2] because it detects that the array length is only two by some mechanism, then this will work. There is a bit of unused memory after the array, but this will not break the program.

    This means that we end up with the following rule:

    Adding elements to a global array variable breaks binary compatibility.
    Removing elements may break compatibility, unless there is a mechanism
    that avoids access to the removed elements.

    Unfortunately, the dynamic loader warning looks more harmless than it actually is, and for removed elements, there is no warning at all.

    How to avoid this situation

    Detecting the ABI change is rather easy with tools such as libabigail.

    The easiest way to avoid this situation is to provide a function that returns the address of the array, using code like this:

    static int local_array[3] = { 1, 2, 3 };
    
    int *
    get_external_array (void)
    {
      return local_array;
    }
    

    If the array definition cannot be made static because of the way it is used in the library, we can give it hidden visibility instead, also preventing its export and therefore avoid the truncation issue:

    int local_array[3] __attribute__ ((visibility ("hidden"))) =
      { 1, 2, 3 };
    

    Things are considerably more complicated if the array variable needs to be exported for reasons of backward compatibility. Because the array is truncated underneath the library if an old main program with a shorter array definition is used, the accessor function will not provide access to the full array for newer client code if it is used with the same global array. Instead, the accessor function could use a separate (static or hidden) array, or perhaps a separate array for the newly added elements at the end. The downside is that it is impossible to keep everything in a contiguous array if the array variable is exported for backward compatibility. The design of the additional interface needs to reflect that.

    With symbol versioning, it is possible to export multiple versions with different sizes, never changing the size associated with a specific version. Using this model, newly linked programs will always use the latest version, presumably with the largest size. Because symbol version and size are fixed by the link editor at the same time, they are always consistent. The GNU C Library uses this approach for the historic sys_errlist and sys_siglist variables. However, this still does not provide a single, contiguous array.

    All things considered, an accessor function (e.g., the get_external_array function above) is the best approach for avoiding this ABI compatibility problem.

    Last updated: February 22, 2024

    Recent Posts

    • Speech-to-text with Whisper and Red Hat AI Inference Server

    • How to use Splunk as an event source for Event-Driven Ansible

    • Integrate vLLM inference on macOS/iOS with Llama Stack APIs

    • Optimize model serving at the edge with RawDeployment mode

    • Introducing Red Hat build of Cryostat 4.0

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue