The new Chat Model 2 introduces Vision Link, our advanced vision feature. This gives each Persona the ability to recognize itself in photos, identify the User, and understand any pictures shared during the conversation. Vision Link brings a new level of realism and interactivity by allowing the AI to process and react to visual content in context. This new ability improves answer quality and strengthens realism and credibility.

How does it work ?

Important: these capabilities are available only on Chat Model 2.

Preparing photo usage

For Chat Model 2 to understand photos, DumGum must first analyze the available images. To enable this, use the replyParameters.vision parameter. See parameter API reference. Through this parameter, you define which images should be analyzed:

userProfilePictureAnalysis – analyzes the User’s main profile photo
personaProfilePictureAnalysis – analyzes the Persona’s main profile photo
sharedPicturesAnalysis – analyzes photos shared by the User so the AI can react to their content

We strongly recommend enabling all three capabilities to fully benefit from the vision features.

Why is profile photo analysis useful ?

When we analyze profile photos, we examine both the content and the physical characteristics of the person. Most importantly, we memorize the person’s face. This allows the AI to understand whether a photo shared by the User is of themselves, of the Persona, or of someone else. Again, we strongly advise enabling this feature if you moderate your users’ profile pictures. It substantially improves response quality.

How does shared photo analysis work ?

Basic principle

When sharedPictures is enabled, our model analyzes the photos shared by Users and can then react to their content in a realistic and coherent way.

Error feedback

For each image, our platform performs a content analysis. If we detect problematic content, your incoming webhook will receive an event of type chat.image.rejected. We recommend handling these events through your moderation service. Possible moderation reasons:

VIOLENT_CONTENT
SELF_HARM
ILLEGAL_CONTENT

API Reference link When such an image is detected, the Persona will reply that it cannot see the image.

How much does it cost ?

Basic principles

The base price is €0.015 per analyzed photo. To avoid unnecessary costs, each photo is analyzed only once per unique URL. For example, a Persona’s main profile picture is analyzed only once and then reused across all conversations without additional charges.

Technical details

Images are identified by their URL, and our system respects the HTTP Cache-Control headers to determine whether an image needs to be analyzed again. When a cached image expires and is used again, we compare its fingerprint (SHA-256) with the version stored in our database. If identical, no new analysis is performed and the cache is refreshed. If different, a new analysis is triggered.

Technical Guide

Step #1: Using the V2 model

To use the V2 model, you need to use the replyParameters.chatModel parameter. For its value, choose chat-2-smart or chat-2-pro depending on the version you are using.

Step #2: Enabling vision

You must enable vision support through the replyParameters.vision parameter as follows:

{
  "userProfileId": "xxxx",
  "personaProfileId": "yyyy",
  "replyParameters": {
    "chatModel": "chat-2-pro",
    "vision": {
      "userProfilePicture": true, // Enables vision for user profile pictures
      "personaProfilePicture": true, // Enables vision for persona profile pictures
      "sharedPictures": true // Enables vision for pictures shared in the chat
    }
  }
}

As mentioned several times, we recommend enabling everything.

Step #3: Updating chat history

In the “Chat History” API endpoint that you integrate on your side, you must return a new message attribute, pictureUrls, which should contain the URLs of all images associated with the message. The model will automatically analyze this image (provided the sharedPictures option in the vision settings is set to true) and will be able to respond accordingly. Note that a message may contain both images (pictureUrls) and text (text attribute), or simply one/multiple image(s).

General

Basics

Content Creation

Best Practices

Miscellaneous

Vision Link: Photo Understanding and Visual Intelligence

How does it work ?

Preparing photo usage

Why is profile photo analysis useful ?

How does shared photo analysis work ?

Basic principle

Error feedback

How much does it cost ?

Basic principles

Technical details

Technical Guide

Step #1: Using the V2 model

Step #2: Enabling vision

Step #3: Updating chat history

General

Basics

Content Creation

Best Practices

Miscellaneous

​How does it work ?

​Preparing photo usage

​Why is profile photo analysis useful ?

​How does shared photo analysis work ?

​Basic principle

​Error feedback

​How much does it cost ?

​Basic principles

​Technical details

​Technical Guide

​Step #1: Using the V2 model

​Step #2: Enabling vision

​Step #3: Updating chat history

How does it work ?

Preparing photo usage

Why is profile photo analysis useful ?

How does shared photo analysis work ?

Basic principle

Error feedback

How much does it cost ?

Basic principles

Technical details

Technical Guide

Step #1: Using the V2 model

Step #2: Enabling vision

Step #3: Updating chat history