Skip to main content

Mastering Maps and Structs in Protobuf

· 6 min read

SHARE THIS BLOG:

Introduction

Protobuf, Google's language-agnostic binary format, has gained traction amongst developers seeking efficient ways to serialize structured data. Protobuf have multiple primitive types that can be used to model your API's such as:

  • Integers - (int32, uint32, int64, uint64,sint32,sint64)
  • Float
  • Double
  • String
  • Bytes
  • Boolean

For more complex use cases Protobuf can model your data as complex types such as other nested enum, messages, oneof and At the heart of this are maps and structs—two essential elements that often puzzle newcomers. This blog will deep-dive into these constructs, illustrating when and why to use each, and their unique challenges and benefits.

The Basics

Maps and Google Protobuf Structs serve distinct purposes. At a high level:

  • Maps: A collection of key-value pairs where each key must be unique. They provide efficient look-up to retrieve values associated with a given key.
  • Structs: A flexible data container, reminiscent of JSON, which can house fields of any type, be it scalar values, lists, or other Structs.

Protobuf Maps: Not Your Typical Hash Table

Maps in many programming languages allow for efficient key-value storage and retrieval. In Protobuf, maps might look and feel like these familiar structures but have their unique nuances.

message UserData {
map<string, int32> userScores = 1;
}

Notable pointers on Protobuf maps:

  • Keys can only be integers, strings, or booleans.
  • Map fields cannot be repeated.
  • Complexity: Maps can store primitive types, even complex structures like other messages. However, with complexity, there can be challenges. For example, while you can nest a message within another map, accessing data requires a precise understanding, especially in languages where key and value might be treated as separate entities.

Google's Protobuf Struct: The JSON Alternative

Simple usage of google.protobuf.Struct in you own protobuf schema my look like:

import "google/protobuf/struct.proto";

message UserDetail {
google.protobuf.Struct userData = 1;
}

The Struct type is meant to capture heterogeneity. While this offers flexibility, you trade-off the strong type guarantees that Protobuf typically provides. Best for unpredictable schemas, but for well-defined data, native Protobuf types might be more suitable.

To understand a little bit deeper what defines the Struct well-known message in the core library of Protobuf we can look at the .proto descriptions found on Github

The code below is taken from the opensource repository of google protobuf:

message Struct {
// Unordered map of dynamically typed values.
map<string, Value> fields = 1;
}

So from the above code snippet we can tell that a struct is essantially using map<> we discussed above, and storing Value to mapped strings.

The Value is another message that is defined on the same .proto file here is what Value look like:

message Value {
// The kind of value.
oneof kind {
// Represents a null value.
NullValue null_value = 1;
// Represents a double value.
double number_value = 2;
// Represents a string value.
string string_value = 3;
// Represents a boolean value.
bool bool_value = 4;
// Represents a structured value.
Struct struct_value = 5;
// Represents a repeated `Value`.
ListValue list_value = 6;
}
}

So we can now have a better grip of what is Struct really is, each Struct have one property called fields, which is a map of strings to Value, the Value is defined by oneof (we will cover oneof in future article) that is holding the "Real" data of the field, each value can be oneof the types that are listed above, that behaviour mimics the data structure of JSON overall but with adding the complexities of protobuf data model.

We can look also the last entities in the struct.proto file to know that value can be served as "Null" or a "List" of Value:

// `NullValue` is a singleton enumeration to represent the null value for the
// `Value` type union.
//
// The JSON representation for `NullValue` is JSON `null`.
enum NullValue {
// Null value.
NULL_VALUE = 0;
}

// `ListValue` is a wrapper around a repeated field of values.
//
// The JSON representation for `ListValue` is JSON array.
message ListValue {
// Repeated field of dynamically typed values.
repeated Value values = 1;
}

Some languages and frameworks may use the internal protobuf structure without giving the developer and "JSON Like" interface to interact with thier Struct's

For example, if we have the following JSON data:

{
"id": 1,
"name": "foo",
"domain": "bar",
"isAuthenticated": true
}

we will have the internal Protobuf pseudo-code structure like:

Struct {
fields {
key: "name"
value {
string_value: "foo"
}
}
fields {
key: "isAuthenticated"
value {
bool_value: true
}
}
fields {
key: "id"
value {
number_value: 1
}
}
fields {
key: "domain"
value {
string_value: "bar"
}
}
}

Notice that the order of the Struct fields aren't guranteed to be the exact order as the JSON the same functionality happen as with working with real JSON data structure.

Accessing Data: Diving Deeper

Retrieving your data correctly is crucial. Depending on the language you're working with, the approach to access the data within a Protobuf map can vary. Here's how you might go about it for Java, Go, and Python:

Go

For instance, in Go, accessing data within a simple Protobuf map would look like:

score := data.UserScores["Alice"]

For nested or more complex structures:

aliceMathScore := complexData.UserDetailedScores["Alice"].UserScores["Math"]

Java

In Java, the generated code for Protobuf maps provides 'get' methods that can be used to retrieve values. For a simple map:

int aliceScore = userData.getUserScoresOrThrow("Alice");

For nested or more complex maps:

int aliceMathScore = complexData.getUserDetailedScoresOrThrow("Alice").getUserScoresOrThrow("Math");

Python

In Python, Protobuf maps can be accessed just like Python dictionaries:

alice_score = data.user_scores["Alice"]

For nested or complex maps:

alice_math_score = complex_data.user_detailed_scores["Alice"].user_scores["Math"]

However, it's essential to realize that the way Protobuf maps manifest in different languages can vary. In some languages, keys and values might appear as separate arrays, necessitating a more complex retrieval process. It's always a good practice to refer to the documentation or inspect the generated code for nuances specific to your language of choice.

The Map-Struct Dilemma: Use Cases

  • Complex Models with Predictable Structure: Maps are the go-to. Their key-value nature provides flexibility. But remember the key limitations.
  • Dynamic or Unpredictable Data: The Struct type, with its JSON-like nature, is perfect for data that doesn't have a fixed schema.

Wrapping Up

Crafting efficient data models using maps and structs in Protobuf requires a nuanced understanding. As we at Sylk venture deeper into the realm of data serialization, we believe that the correct application of these tools can lead to more efficient and versatile applications. We hope this guide serves as a primer for your endeavors with Protobuf. Here's to better serialization!

SHARE THIS BLOG:

Get Started

Copy & paste the following command line in your terminal to create your first Sylk project.

pip install sylkCopy

Sylk Cloud

Experience a streamlined gRPC environment built to optimize your development process.

Get Started

Redefine Your protobuf & gRPC Workflow

Collaborative Development, Safeguarding Against Breaking Changes, Ensuring Schema Integrity.