Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Use source-level annotations to help GCC detect buffer overflows

June 25, 2021
Martin Sebor
Related topics:
C, C#, C++LinuxSecurity
Related products:
Red Hat Enterprise Linux

Share:

    Out-of-bounds memory accesses such as buffer overflow bugs remain among the most dangerous software weaknesses in 2021 (see 2020 CWE Top 25 Most Dangerous Software Weaknesses). In fact, out-of-bounds write (CWE-787) jumped from the twelfth position in 2019 to second in 2020, while out-of-bounds read (CWE-125) moved from the fifth to the fourth position.

    Recognizing the importance of detecting coding bugs early in the development cycle, recent GNU Compiler Collection (GCC) releases have significantly improved the compiler's ability to diagnose these dangerous bugs by using warnings such as -Warray-bounds, -Wformat-overflow, -Wstringop-overflow, and (most recently in GCC 11) -Wstringop-overread. However, a common limitation shared by all these warnings is that they can only analyze code in a single function at a time. With the exception of calls to a small set of intrinsic functions like memcpy() built into the compiler, the warnings stop at the function call boundary. That means that when a buffer allocated in one function overflows in a function called from it, the problem is not detected unless the called function is inlined into the caller.

    This article describes three kinds of simple source-level annotations that programs can use to help GCC detect out-of-bounds accesses across function call boundaries, even if the functions are defined in different source files:

    • Attribute access (first introduced in GCC 10, available in both C and C++)
    • Variable-length array (VLA) function parameters (new in GCC 11, available in C only)
    • Array function parameters (new in GCC 11, available in C only)

    Attribute access

    The access function attribute is useful for functions that take a pointer to a buffer as one argument and its size as another. An example might be the POSIX read() and write() pair of functions. Besides letting the programmer associate the two parameters, the attribute also specifies how the function accesses the contents buffer. The attribute applies to function declarations and is used both at call sites and when analyzing the definition of a function to detect invalid accesses.

    The attribute has the following syntax:

    • access (access-mode, ref-index)
    • access (access-mode, ref-index, size-index)

    ref-index and size-index denote positional arguments and give the 1-based argument numbers of the buffer and its size, respectively. The buffer argument ref-index can be declared as an ordinary object pointer, including void*, or using the array form (such as T[] or T[N]). It need not point to a complete type. The optional size-index must refer to an integer argument that specifies the number of elements of the array the function might access. For buffers of incomplete type such as void*, the size argument is taken to give the number of bytes. When size-index is not specified the buffer is assumed to have one element.

    access-mode describes how the function accesses the buffer. In GCC 11, four modes are recognized:

    • The read_only mode indicates that the function reads data from the provided buffer but doesn't write into it. The buffer is expected to be initialized by the caller. The read_only mode implies a stronger guarantee than the const qualifier on the buffer because the qualifier can be cast away and the buffer modified in a well-defined program, provided the buffer object itself isn't const. The parameter to which the read_only mode is applied may (but need not) be const-qualified. Declaring a parameter read_only has the same meaning as declaring one both const and restrict in C99 (although GCC 11 doesn't recognize the two as equivalent).
    • The write_only mode indicates that the function writes data into the provided buffer but doesn't read from it. The buffer need not be initialized. Attempting to apply the write_only mode to a const-qualified parameter causes a warning and the attribute is ignored. This is effectively the default mode for parameters with no associated attribute access.
    • The read_write mode indicates that the function both reads and writes data into the buffer. The buffer is expected to be initialized. Attempting to apply the read_write mode to a const-qualified parameter causes a warning and the attribute is ignored.
    • The none mode means the function doesn't access the buffer at all. The buffer need not be initialized. This mode is new in GCC 11 and is provided for functions that perform argument validation without accessing the data in the buffer.

    The following example shows how to use the attribute to annotate the POSIX read() and write() functions:

    __attribute__ ((access (write_only, 2, 3))) ssize_t
    read (int fd, void *buf, size_t nbytes); 
    __attribute__ ((access (read_only, 2, 3))) ssize_t
    write (int fd, const void *buf, size_t nbytes);

    Because the read() function stores data in the provided buffer the attribute access mode is write_only. Similarly, because write() reads the data from the buffer the access mode is read_only.

    The access attribute serves a similar function as declaring a function parameter using the variable-length array notation, except it's more flexible. Besides the access mode, the size-index argument can associate a pointer with a size that comes after it in the function argument list, as is often the case. We'll discuss the VLA notation in the next section.

    VLA function parameters

    In C (but in GCC 11, not in C++), a function parameter declared using the array notation can refer to nonconstant expressions, including prior parameters to the same function, as its bounds. When the bound refers to another function parameter, that parameter's declaration must precede that of the VLA (GCC provides an extension to get around that language limitation; see Arrays of Variable Length in the GCC manual). When only the most significant bound uses such a bound, it decays to an ordinary pointer just like any other array. Otherwise, it is a VLA. Since the distinction between the two kinds of arrays in this context is rather subtle, GCC diagnostics refer to both as VLAs. We will follow this simplifying convention as well in the rest of the article. For example:

    void init_array (int n, int a[n]);

    The function takes an ordinary array (or, more precisely, a pointer) as its second argument, whose number of elements is given by the first argument. Although it's not necessarily required by the language, passing the function an array with fewer elements than the first argument indicates is almost certainly a bug. GCC checks calls to such functions and issues warnings when it determines that the array is smaller than expected. For instance, the vla_init program has GCC issue the following warning:

    #define N 32
    
    int* f (void)
    {
      int *a = (int *)malloc (N);
      init_array (N, a);
      return a;
    }
    In function 'f':
    warning: 'init_array' accessing 128 bytes in a region of size 32 [-Wstringop-overflow=]
       10 |     init_array (N, a);
          |     ^~~~~~~~~~~~~~~~~
    note: referencing argument 2 of type 'int *'
    note: in a call to function 'init_array'
        5 | void init_array (int n, int a[n]);
          |      ^~~~~~~~~~

    The warning detects the (likely) bug of passing an array to init_array that's smaller than the first argument indicates.

    As already mentioned, a declaration of an array where only the most significant bound is variable doesn't actually declare a VLA, but an ordinary array. Only the less significant bounds matter. What that means is that the declarations in the following example are all valid and equivalent:

    void init_vla (int n, int[n]);
    void init_vla (int, int[32]);
    void init_vla (int, int*);
    void init_vla (int n, int[n + 1]);

    That, however, presents a problem: Which of the declarations should be used for the out-of-bounds access warning? The solution implemented in GCC 11 is to trust the first declaration and issue a separate warning, -Wvla-parameter, for any subsequent redeclarations that suggest a different number of elements in the array. The four declarations in the preceding example then cause the following warnings:

    warning: argument 2 of type 'int[32]' declared as an ordinary array [-Wvla-parameter]
        2 | void init_vla (int, int[32]);
          |                     ^~~~~~~
    warning: argument 2 of type 'int *' declared as a pointer [-Wvla-parameter]
        3 | void init_vla (int, int*);
          |                     ^~~~
    warning: argument 2 of type 'int[n + 1]' declared with mismatched bound [-Wvla-parameter]
        4 | void init_vla (int, int[n + 1]);
          |                     ^~~~~~~~~
    note: previously declared as a variable length array 'int[n]'
        1 | void init_vla (int n, int[n]);
          |                       ^~~~~~

    Array function parameters

    Because of concerns of unbounded stack allocation, VLAs tend to be underused in modern C code, even in contexts like function declarations where they are not only safe but help improve the ability to analyze code. In the absence of VLAs, some projects use a simpler convention to declare function parameters that expect callers to provide access to some constant minimum number of elements, say N, using the ordinary array notation T[N]. For example, the C standard function tmpnam() expects its argument to point to an array with at least L_tmpnam elements. To make that explicit, GNU libc 2.34 declares it as:

    char *tmpnam (char[L_tmpnam]);

    GCC 11 recognizes this convention, and when it determines that a call to the function provides a smaller array, it issues a warning. For example, on Linux where L_tmpnam is defined to 20, for the function shown next GCC issues the following warning:

    void g (void)
    {
      char a[16];
      if (tmpnam (a))
        puts (a);
    }
    In function 'g':
    warning: 'tmpnam' accessing 20 bytes in a region of size 16 [-Wstringop-overflow=]
     10 | if (tmpnam (a))
        |     ^~~~~~~~~~
    note: referencing argument 1 of type 'char *'
    note: in a call to function 'tmpnam'
      3 | extern char* tmpnam (char[L_tmpnam]);
        |              ^~~~~~

    In addition to function calls, GCC 11 also checks the definitions of functions declared with array parameters and issues warnings for accesses that are out of bounds given the constant bounds. For instance, this definition of the init_array() function triggers a -Warray-bounds warning as shown in the Compiler Explorer example:

    void init_array (int, int a[32])
    { 
      a[32] = 0;
    }
    In function 'init_array':
    warning: array subscript 32 is outside array bounds of 'int[32]' [-Warray-bounds]
        3 |   a[32] = 0;
          |   ~^~~~
    note: while referencing 'a'
        1 | void init_array (int, int a[32])
          |                       ~~~~^~~~~

    Similarly to the function redeclarations involving VLA parameters, GCC also checks those involving the array forms of parameters and issues a -Warray-parameter warning for mismatches as shown in the following example:

    void init_array (int, int[32]);
    void init_array (int, int[16]);
    void init_array (int n, int[]);
    void init_array (int n, int*);
    warning: argument 2 of type 'int[16]' with mismatched bound [-Warray-parameter=]
        2 | void init_array (int, int[16]);
          |                       ^~~~~~~
    warning: argument 2 of type 'int[]' with mismatched bound [-Warray-parameter=]
        3 | void init_array (int n, int[]);
          |                         ^~~~~
    warning: argument 2 of type 'int *' declared as a pointer [-Warray-parameter=]
        4 | void init_array (int n, int*);
          |                         ^~~~
    note: previously declared as an array 'int[32]'
        1 | void init_array (int, int[32]);
          |                       ^~~~~~~

    Caveats and limitations

    The features discussed here are unique in one interesting respect: They involve both simple lexical analysis and more involved, flow-sensitive analysis. In theory, lexical warnings can be both sound and complete (that is, they suffer from neither false positives nor false negatives). Because they are handled during lexical analysis, the -Warray-parameter and -Wvla-parameters warnings are virtually free of such problems. Flow-based warnings, on the other hand, are inherently neither sound nor complete; rather, they are unavoidably prone to both false positives and negatives.

    False negatives

    To use the access attributes and detect out-of-bounds accesses, the functions to which they apply must not be inlined. Once a function is inlined into its caller, most of its attributes are usually lost. That can prevent GCC from detecting bugs if the out-of-bounds access cannot easily be determined from the inlined function body. For example, the genfname() function in the following code listing uses getpid() to generate a temporary file name in the /tmp directory. Because on most systems the POSIX gepid() function returns a 32-bit int, the longest name the function can generate is 26 characters (10 for INT_MAX, plus 16 for the /tmp/tmpfile.txt string, plus 1 byte for the terminating nul character). When the genfname(a) call in main() is not inlined, GCC issues the following warning as expected. But when the call is inlined, the warning disappears. You can see the two scenarios side by side here.

    #include <stdio.h>
    #include <unistd.h>
      
    inline void genfname (char name[27])
    {
      snprintf (name, 27, "/tmp/tmpfile%u.txt", getpid ());
    }
    
    int main (void)
    {
      char name[16];
      genfname (name);
      puts (name);
    }
    
    In function 'main':
    warning: 'f' accessing 27 bytes in a region of size 16 [-Wstringop-overflow=]
       11 |   f (a);
          |   ^~~~~
    note: referencing argument 1 of type 'char *'
    note: in a call to function 'f'
        3 | inline void f (char a[27])
          |             ^

    As an aside, if you're wondering why the sprintf() call isn't diagnosed by -Wformat-truncation, it's because the warning is unable to determine anything about the getpid() result.

    False positives

    Generally, the detection of out-of-bounds accesses based on the annotations discussed here is subject to the same limitations and shortcomings as all flow-sensitive warnings in GCC. For a detailed discussion of these, see Understanding GCC warnings, Part 2. A couple of commonly reported issues specific to the function annotation mechanisms might be worth going over.

    As mentioned previously, some projects use the array parameter notation with a constant bound to provide a visual clue that the caller should supply an array with at least as many elements. But sometimes the convention is fuzzy, meaning that the function only uses the array when another parameter has this or that value. Since there is nothing to communicate this "quirk" of the convention to GCC, a warning might end up issued even when the use is safe. We suggest avoiding using the convention in those cases.

    Future work

    In GCC 11, you can use the access attribute to detect the following:

    • Out-of-bounds accesses: -Warray-bounds, -Wformat-overflow, -Wstringop-overflow, and -Wstringop-overread
    • Overlapping accesses: -Wrestrict
    • Uninitialized accesses: -Wuninitialized

    In the future, we would like to use the attribute to also detect variables that are only written to but never read from (-Wunused-but-set-parameter and -Wunused-but-set-variable).

    We are also considering extending the access attribute in some form to function return values as well as to variables. Annotating function return values will let GCC detect attempts to modify immutable objects via pointers returned from functions like getenv() or localeconv(). Similarly, annotating global variables will make it possible to detect accidentally modifying the contents of objects such as the environment pointer array environ.

    Last updated: August 26, 2022

    Related Posts

    • Detecting memory management bugs with GCC 11, Part 1: Understanding dynamic allocation

    • Detecting memory management bugs with GCC 11, Part 2: Deallocation functions

    Recent Posts

    • Integrate Red Hat AI Inference Server & LangChain in agentic workflows

    • Streamline multi-cloud operations with Ansible and ServiceNow

    • Automate dynamic application security testing with RapiDAST

    • Assessing AI for OpenShift operations: Advanced configurations

    • OpenShift Lightspeed: Assessing AI for OpenShift operations

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue