jeudi 6 février 2020

ProtoBuf: How to reuse FieldDescriptor for new Messages? How to efficiently get field values from dynamic Messages?

I’m currently working on a project in which I want to use Google Protocol Buffers (C++) to serialize and deserialize data. The application has the following requirements:

  1. One requirement is to NOT use the standard approach of generating pre-compiled C++ classes from .proto files (using protoc). Instead I’m using the google::protobuf::compiler::Importer to import my .proto files at runtime and then dynamically create google::protobuf::Message’s using the google::protobuf::DynamicMessageFactory. With this approach I can already serialize/deserialize byte arrays/messages without generating the C++ classes beforehand.
  2. Performance: I’m expecting new byte streams roughly every 50 ms so parsing a byte array and reading values from the dynamically created message must be quite efficient. For the parsing step I’m just using the standard ParseFromArray(…) method to get my message. For now, I don’t see the need to optimize this step. Instead I’m currently looking for a way to retrieve the values more efficiently.

I know that 1. and 2. kind of contradict each other because using the DynamicMessageFactory is probably more expensive than generating pre-compiled C++ classes, but unfortunately 1. is a hard requirement and cannot be changed.

To retrieve the values of my parsed message I’m currently using google::protobuf::Reflection and google::protobuf::Descriptor to iterate through my message until the corresponding google::protobuf::FieldDescriptor is found. Since iterating through the message is quite inefficient, especially when a message is expected to have ~1000 fields, I thought of caching the found FieldDescriptor in a map and then reuse the cached FieldDescriptor’s for other messages without iterating through each message again (since all my messages have the same structure anyway). This approach was proposed here. Unfortunately I didn’t manage to get it to work (see my simplified example code below). Can you guys help me? Thank you in advance for any suggestions.

int main()
{
  MyUtil util; // My protobuf utility class

  int size;
  uint8_t* data = util.GenerateByteArray(&size); // Generate sample byte array

  Message* message1 = util.ParseFromArray(data, size);
  util.SetDouble(message1, "position.x", 1.111);

  Message* message2 = util.ParseFromArray(data, size);
  util.SetDouble(message2, "position.x", 2.222);

  cout << util.GetDouble(message1, "position.x") << endl;
  cout << util.GetDouble(message2, "position.x") << endl;

  // This works as expected if my GetDouble method iterates through the whole message for every new message.
  // But if I try to cache my FieldDescriptors I get the wrong output (see below).

  return 0;
}

class MyUtil
{
private:
  // The cached descriptors
  const Message* mTempMessage = nullptr;
  const FieldDescriptor* mTempField = nullptr;

public:
  MyUtil() { /*Initializing and importing .proto files*/ }

  // ... Some other methods

  double GetDouble(const Message* message, const string& path)
  {
    if (mTempField)
    {
      // Problem: This doesn't work if I only pass the actual 'message2' so I attempted to also cache mTempMessage.
      // But mTempMessage will always refer to the original message1 from which mTempField was retrieved.
      // This is why util.GetDouble(message2, "position.x") will always only return the value of message1.
      // So I know why my output is wrong but I don't know how the correct solution should look like.
      return mTempMessage->GetReflection()->GetDouble(*mTempMessage, mTempField);
    }

    vector<string> fieldNames;
    boost::split(fieldNames, path, boost::is_any_of(".")); // "position.x" => ["position", "x"]

    vector<string>::iterator iterator = fieldNames.begin();
    vector<string>::iterator last = fieldNames.end() - 1;

    return GetDouble(message, &iterator, &last);
  }

  double GetDouble(const Message* message, vector<string>::iterator* pathIterator, vector<string>::iterator* pathLast)
  {
    // Get FieldDescriptor using message->GetDescriptor()->FindFieldByName(...)
    const FieldDescriptor* field = GetField(message, **pathIterator);

    if (*pathIterator != *pathLast)
    {
      // Field "position": Is another message, so call recursively for next field
      (*pathIterator)++;

      // Get the inner "position" message using Reflection
      const Message* messageField = GetMessageField(message, field);

      return GetDouble(messageField, pathIterator, pathLast);
    }
    else if (field->type() == FieldDescriptor::TYPE_DOUBLE)
    {
      // Field "x": Actual value can be retrieved

      // Caching the found FieldDescriptor (and also the current "position" message)
      mTempMessage = message;
      mTempField = field;

      return message->GetReflection()->GetDouble(*message, field);
    }
    else
    {
      // Return some invalid value
    }
  }
};




Aucun commentaire:

Enregistrer un commentaire