programming
Dec 17, 2024

Flattening Legacy C++ APIs for Rust Integration

Recipe for using an existing C++ API in Rust

author picture
Jan Sulmont
Developer
image

Introduction

This blog post presents a solution for integrating Rust with legacy C++ APIs, focusing on safe and efficient callback management. While developing a near-realtime data streaming system, I encountered several challenges in reconciling Rust’s safety guarantees with cross-language interactions. The approach described here, though developed for a specific use case, provides patterns applicable to many C++-to-Rust integration scenarios.

Overview

Managing callbacks between C++ and Rust presents unique challenges. During a callback, Rust code must work with memory owned by the C++ library - data whose structure and alignment are opaque to Rust’s type system. This requires careful handling of unsafe code, which can undermine Rust’s safety guarantees.

My initial solution used a singleton pattern, limiting each process to one connection to the upstream service (in this case, an MQTT broker, but the same limitation would apply when connecting to other systems like financial exchanges or data feeds). While this worked technically, it had several major drawbacks:

  1. The singleton pattern meant only one connection to the upstream service could exist per process
  2. State had to be managed through global static variables (via something like lazy_static!)
  3. Async code became particularly difficult as the global state needed thread-safe access patterns
  4. Testing was cumbersome as the global state had to be carefully managed between test cases
use dashmap::DashMap;
use lazy_static::lazy_static;

#[derive(Debug, Clone)]
enum ClientState { New, Active, Error }

// Forced to use global state for all clients
lazy_static! {
   static ref CLIENT_STATES: DashMap<i64, ClientState> = DashMap::new();
}

pub unsafe extern "C" fn message_event_cb(event_ptr: *const MessageEvent) {
   let event = match event_ptr.as_ref() {
       Some(e) => e,
       None => {
           println!("Null pointer in callback");
           return;
       }
   };

   let tag = event.get_tag();

   // Hard to test/maintain state machine through global state
   let next_state = match CLIENT_STATES.get(&tag).map(|s| s.clone()) {
       None => ClientState::New,
       Some(ClientState::New) if event.is_valid() => ClientState::Active,
       Some(ClientState::Active) => ClientState::Active,
       _ => ClientState::Error,
   };

   CLIENT_STATES.insert(tag, next_state);
}

The improved approach presented here encapsulates API callbacks within a Rust object stored alongside the client. By implementing extern “C” functions as client members with access to this context, we can maintain Rust’s safety guarantees while providing efficient zero-copy access to callback data when needed.

Bridging C++ and Rust: A Practical Guide to API Flattening

When working on legacy modernization projects, we often need to integrate Rust with existing C++ codebases. This presents unique challenges due to fundamental differences in how these languages handle object models, memory management, and binary interfaces.

Today, we’ll explore a technique called “API flattening” using PolarMqtt, a toy C++ library I created specifically for this demonstration. While inspired by Eclipse Paho C++ API (it is implemented using the C API), PolarMqtt exists purely to illustrate these concepts - for production MQTT needs, use the official paho-mqtt crate.

The complete source code for this demonstration is available in the PolarMqtt repository. All code examples in this post correspond to this version, which has been archived for reference.

C++ Object Model Challenges in FFI

Legacy C++ APIs often present several integration challenges for Rust:

  • Incompatible vtable implementations (Virtual Method Tables)
  • Complex inheritance hierarchies
  • RAII patterns that need careful handling
  • Thread safety guarantees that must be preserved
  • Exception handling that must be converted to Rust’s Result type

Here’s a typical C++ API we might encounter:

// Original C++ API
class Session {
public:
    virtual void onMessage(const Message& msg) = 0;
    virtual void onStateChange(SessionState state) = 0;
    // ... more virtual methods
};

// Usage requires inheritance
class MySession : public Session {
    void onMessage(const Message& msg) override {
        // Handle message
    }
};

The bindgen Challenge

While bindgen works excellently for C APIs, it often struggles with C++ codebases. As of 2024, there’s no robust, general-purpose solution for automatically generating Rust bindings to C++ APIs. Common issues include: - Inability to properly handle C++ templates

  • Broken callback bindings due to vtable incompatibilities
  • Complications with STL containers and RAII types
  • Name mangling mismatches
  • Template instantiation problems
  • Complex inheritance hierarchies that don’t map to Rust’s trait system

This is why we need to “flatten” C++ APIs to C first, creating a simpler intermediate layer that bindgen can reliably handle. While this requires more upfront work, it results in more maintainable and reliable bindings.

The Solution: API Flattening

API flattening introduces a C-compatible intermediate layer that bridges these incompatibilities:

┌────────────┐     ┌────────────┐     ┌────────────┐
│   C++ API  │ --> │    C API   │ --> │  Rust API  │
│ (Original) │     │(Flattened) │     │ (Binding)  │
└────────────┘     └────────────┘     └────────────┘

1. Creating the C Bridge

First, we flatten the C++ interface into C-compatible types:

// C bridge layer
typedef struct mqtt_session_t* mqtt_session_handle_t;

typedef void (*mqtt_message_callback_t)(
    const mqtt_message_data_t* message,
    void* user_context
);

mqtt_session_handle_t mqtt_create_session(
    const char* client_id,
    mqtt_message_callback_t message_cb,
    void* user_context
);

2. Designing Safe Rust Types

In Rust, we create two distinct message types to handle different ownership scenarios:

// For publishing: Owned data
#[derive(Debug, Clone)]
pub struct Message {
    topic: String,
    payload: Vec<u8>,
    qos: QoS,
    retained: bool,
}

// For callbacks: Borrowed data
#[derive(Debug)]
pub struct MessageView<'a> {
    topic: &'a str,      // Borrows from C++
    payload: &'a [u8],   // Borrows from C++
    qos: QoS,
    retained: bool,
}

Why MessageView Needs a Lifetime Parameter

The 'a lifetime in MessageView is crucial for safety:

  • During callbacks, the C++ layer owns the message data (topic and payload).
  • MessageView borrows this data only for the duration of the callback.
  • The 'a lifetime ensures these references cannot outlive the callback.
  • After the callback returns, the C++ layer may free or modify the data.

For example, this code would fail to compile:

let stored_message: Option<MessageView> = None;

client.on_message(|msg| {
    stored_message = Some(msg);  // Compile error! Can't store borrowed data
});

MessageView::to_owned()

When receiving messages through a callback, the data arrives as a MessageView reference, which provides zero-copy access to data held by the underlying C++ API. If you need to retain the message data beyond the callback’s scope, you can call to_owned() to create a deep copy, which returns an owned Message:

client.on_message(|msg| {
    let owned_message = msg.to_owned();  // Creates owned copy we can store
    process_later(owned_message);
});

See here for an example.

3. Implementing Thread-Safe Callbacks

The callback system must be thread-safe as C++ can invoke callbacks from any thread:

pub type MessageCallback = dyn Fn(&MessageView) + Send + Sync;
pub type StateCallback = dyn Fn(ConnectionState) + Send + Sync;
pub type ErrorCallback = dyn Fn(i32, &str) + Send + Sync;

struct CallbackContext {
    message_callback: Box<MessageCallback>,
    state_callback: Box<StateCallback>,
    error_callback: Box<ErrorCallback>,
}

4. Building the Safe Client Interface

The main client interface wraps the unsafe C API in a safe abstraction:

pub struct Client {
    session: *mut bindings::mqtt_session_t,
    _context: Box<CallbackContext>,
}

impl Client {
    pub fn new<F1, F2, F3>(
        client_id: &str,
        on_message: F1,
        on_state_change: F2,
        on_error: F3,
    ) -> Result<Self>
    where
        F1: Fn(&MessageView) + Send + Sync + 'static,
        F2: Fn(ConnectionState) + Send + Sync + 'static,
        F3: Fn(i32, &str) + Send + Sync + 'static,
    {
        let context = Box::new(CallbackContext {
            message_callback: Box::new(on_message),
            state_callback: Box::new(on_state_change),
            error_callback: Box::new(on_error),
        });

        let context_ptr = Box::into_raw(context) as *mut std::ffi::c_void;
        // Implementation...
    }
}

// Safety: All mutable state is protected by Mutex
unsafe impl Send for Client {}
unsafe impl Sync for Client {}

5. Error Handling with thiserror

We use thiserror to define our error types, though any error handling approach would work equally well here. The choice of thiserror is merely for demonstration purposes - you could just as easily use standard error enums, anyhow, snafu or your preferred error handling solution:

use thiserror::Error;

#[derive(Debug, Error)]
pub enum Error {
    #[error("MQTT initialization failed")]
    InitializationError,
    #[error("Invalid broker URL")]
    InvalidBrokerUrl,
    #[error("Connection failed")]
    ConnectionError,
    #[error("String contains null byte: {0}")]
    NulError(#[from] NulError),
}

pub type Result<T> = std::result::Result<T, Error>;

The key point here isn’t the specific error handling crate, but rather ensuring that C++ errors are properly translated into Rust’s error handling paradigm in a way that preserves important error context and maintains type safety.

Usage Example

Here’s how the flattened API looks in practice:

fn main() -> Result<()> {
    let mut client = Client::new(
        "test-client",
        |msg| println!("Received: {}", msg.topic()),
        |state| println!("State: {:?}", state),
        |code, err| eprintln!("Error {}: {}", code, err),
    )?;

    client.connect("test.mosquitto.org", 1883)?;

    let message = Message::new("test/topic", "Hello MQTT!");
    client.publish(&message)?;

    Ok(())
}

Testing Strategy

Integration testing is crucial for FFI code:

#[test]
fn test_integration() {
    let (tx, rx) = mpsc::channel();

    let mut client = Client::new(
        "test-client",
        move |msg| {
            tx.send(msg.to_owned()).unwrap();
        },
        |state| println!("State: {:?}", state),
        |code, err| eprintln!("Error {}: {}", code, err),
    ).unwrap();

    client.connect("test.mosquitto.org", 1883).unwrap();

    let message = Message::new("test/topic", "test data");
    client.publish(&message).unwrap();

    let received = rx.recv_timeout(Duration::from_secs(5)).unwrap();
    assert_eq!(received.payload(), message.payload());
}

Understanding Unsafe Extern C Callbacks

The callback system represents a critical part of our FFI layer, requiring careful handling of unsafe code and data lifetimes:

unsafe extern "C" fn message_callback(
    message: *const bindings::mqtt_message_data_t,
    context: *mut std::ffi::c_void,
) {
    // 1. Safety: Validate raw pointers
    if message.is_null() || context.is_null() {
        return;
    }

    // 2. Context recovery
    let context = &*(context as *const CallbackContext);

    // 3. Safe payload handling
    let payload = if (*message).payload.is_null() || (*message).payload_length == 0 {
        &[]
    } else if (*message).payload_length > isize::MAX as usize {
        eprintln!("Payload too large");
        return;
    } else {
        std::slice::from_raw_parts((*message).payload, (*message).payload_length)
    };

    // 4. String conversion
    let topic = match CStr::from_ptr((*message).topic).to_str() {
        Ok(s) => s,
        Err(_) => return,
    };

    // 5. Create safe view
    let msg = MessageView {
        topic,
        payload,
        qos: (*message).qos.into(),
        retained: (*message).retained != 0,
    };

    // 6. Execute callback
    (context.message_callback)(&msg);
}

Key Safety Considerations

  1. Context Management: The CallbackContext must outlive all callbacks:

    pub struct Client {
        session: *mut bindings::mqtt_session_t,
        _context: Box<CallbackContext>, // Kept alive by Client
    }
  2. Lifetime Guarantees: MessageView borrows data only for the duration of the callback:

    pub struct MessageView<'a> {
        topic: &'a str,    // Lives for callback duration
        payload: &'a [u8], // Lives for callback duration
        // ...
    }
  3. Thread Safety: Callbacks can occur from any thread, requiring Send + Sync:

    pub type MessageCallback = dyn Fn(&MessageView) + Send + Sync;
    unsafe impl Send for Client {}
    unsafe impl Sync for Client {}

Best Practices

There are a few more good practices that will produce a good Rust library, that’s safe, effective, and pleasant to use.

  1. Memory Safety

    • Use RAII in both C++ and Rust layers
    • Document ownership transfers explicitly
    • Minimize unsafe code blocks
  2. Error Handling

    • Convert C++ exceptions to error codes at the FFI boundary
    • Use thiserror for error definitions
    • Provide rich error context
  3. Thread Safety

    • Make thread safety requirements explicit
    • Use appropriate synchronization primitives
    • Ensure callbacks are Send + Sync
  4. Documentation

    • Document safety requirements for unsafe functions
    • Explain lifetime guarantees for borrowed data
    • Provide usage examples

Conclusion

While our example uses PolarMqtt, a toy library created for this demonstration, the patterns and techniques shown here apply to real-world C++ to Rust integration scenarios. The key is understanding the incompatibilities between the languages (particularly around ABI, vtables, and RAII) and systematically addressing them through a well-designed intermediary layer.

Publicly available test MQTT brokers

The examples above use test.mosquitto.org, a public MQTT broker maintained by the Eclipse Foundation. While this broker is typically reliable, it may occasionally be unavailable for maintenance. Alternative public test brokers include broker.emqx.io (provided by EMQ) and broker.hivemq.com (provided by HiveMQ) - both accessible on port 1883. Remember that these are public test brokers intended for development and testing purposes only; production deployments should use dedicated, secured MQTT brokers.

Source Code

The complete implementation of the patterns discussed in this post can be found in the PolarMqtt repository. The repository includes:

  • The original C++ API
  • The flattened C interface
  • The complete Rust bindings
  • Integration tests and examples

Further Reading

Recommended Resources
Head Office
Norfolk House, Silbury Blvd.
Milton Keynes, MK9 2AH
United Kingdom
Company registration: 08457399
Copyright © JUXT LTD. 2012-2024
Privacy Policy Terms of Use Contact Us
Get industry news, insights, research, updates and events directly to your inbox

Sign up for our newsletter