Helix Search configuration

Note
  • You must be an admin user to configure Helix Search.

  • The Helix Search service must be configured to view the Service status page and to configure services using the Configuration page:

    • Windows installer: Basic Helix Search services are configured as part of the installation process.

    • Linux installation: The Helix Search service must be configured manually by editing the etc/config.properties file. See Configure the Helix Search service.

Tip

Helix Search can also be configured using the Helix Search web page, see Configure Helix Search.

This section describes configuration of the Helix Search, Helix Core Server, index, Elasticsearch, and auto-tagging services:

Configure the Helix Search service

This section details configuration of the Helix Search service including connection to external services such as, Helix Core Server and Elasticsearch.

You can either edit the etc/config.properties file or use the web page.

To view and change the configuration of your Helix Search service, navigate to the configuration page (/settings/configure) using the connection details you set for the Helix Search service during the installation. If credentials are requested, provide Helix Core Server credentials that are admin or greater.

For example:

http://myhelixsearch.mydomain.com:1601/p4search/settings/configure

Configure the product and version

Helix Search has a product name and version configuration which is used in the Helix Core Server log.

Default configuration:

com.perforce.p4search.service.product=p4search Product name visible in the Helix Core Server log

com.perforce.p4search.service.version=0.1 Product version visible in the Helix Core Server log

For example:

If you have multiple instances of Helix Search, we recommend that you append the product name to help distinguish between the log entries. For example:

com.perforce.p4search.service.product=p4search-1

com.perforce.p4search.service.product=p4search-2

Configure the web service

By default, the Helix Search web service is public to machines on the same network.

Default configuration:

com.perforce.p4search.service.protocol=http

com.perforce.p4search.service.host=0.0.0.0

com.perforce.p4search.service.port=1601

com.perforce.p4search.service.keystore= see Configure SSL security for Helix Search

com.perforce.p4search.service.keypass= see Configure SSL security for Helix Search

Tip

You can configure the Helix Search web service to use HTTPS, see Configure SSL security for Helix Search.

Configure the location and password of the local trust store

The location and password of the Helix Search's local trust store. The local trust store location defaults to ./etc/truststore.jks and the password is randomly generated.

Default configuration:

com.perforce.p4search.service.truststore=./etc/truststore.jks

com.perforce.p4search.service.trustpass=<PASSWORD>

Configure the external URL

The URL of your Helix Search instance as seen by other applications, defaults to http://localhost:1601. For example, this external URL can be used by Helix Core Server or Helix DAM to connect to a specific instance of Helix Search.

Default configuration:

com.perforce.p4search.service.external-url=http://localhost:1601

Specify CORS

Specify an origin for the browser to allow resource access. This type of permission is also known as Cross-Origin Resource Sharing (CORS). If the CORS value is left blank, the external URL value is used as an allowed origin along with the protocol and port number. To secure your site, https protocol must be used. The default value is empty.

Default configuration:

com.perforce.p4search.service.access-control-allow-origin=

Configure the live logs limit

The number of lines of live logging data displayed in the logging window in the Helix Search web UI, defaults to 300 lines.

Default configuration:

com.perforce.p4search.service.live-log-limit=300

Configure the global rate limit

The number of requests per minute for all API endpoints, defaults to 600.

Default configuration:

com.perforce.p4search.service.global-rate-limit=600

Configure the login rate limit

The number of login requests per minute for all API endpoints, defaults to 5.

Default configuration:

com.perforce.p4search.service.login-rate-limit=5

Configure JWT Access token

The duration after which the JWT Access token expires, defaults to 600 seconds.

com.perforce.p4search.service.access-token-ttl=600

Configure JWT Refresh token

The duration after which the JWT Refresh token expires, defaults to 900 seconds.

com.perforce.p4search.service.refresh-token-ttl=900

Configure the Nonce token

A nonce token is a unique, random number or string that is used only once and then discarded. It acts as a security measure to prevent replay attacks in communication protocols, particularly in API requests.

The nonce token is included in the request header along with the authentication token, and it must be present in all API requests for security purposes. This practice ensures that each request is unique, even when the same authentication details are used.

Defaults to false. To enable a nonce token, which is a unique string, set com.perforce.p4search.service.nonce to true.

Default configuration:

com.perforce.p4search.service.nonce=false

Configure Helix Core Server

The Helix Core Server connection requires a P4PORT, a Perforce Standard User and a ticket.

Default configuration:

com.perforce.p4search.service.startup-retry=30000

com.perforce.p4search.core.p4port=localhost:1666

com.perforce.p4search.core.p4trust=

com.perforce.p4search.core.service.p4user=p4search

com.perforce.p4search.core.service.p4ticket=AEEB1208CB06479B022D97C2784EEFDA

com.perforce.p4search.core.index.p4user=p4index

com.perforce.p4search.core.index.p4ticket=AEEB1208CB06479B022D97C2784EEFDA

com.perforce.p4search.core.p4ignore=

com.perforce.p4search.core.extension.auto-update=true

Specify the P4Port setting and the Perforce service user

Important

A Standard or Service user with a minimum of admin access. Used to authorize the proxy connection at security level 5 and above. For instructions on creating the Perforce Service User and printing out the ticket, see Create the Perforce Service user .

For example:

com.perforce.p4search.core.p4port=perforce.com:1666
com.perforce.p4search.core.service.p4user=service
com.perforce.p4search.core.service.p4ticket=FFEEDDCCBBAA99887766554433221100

For an ssl enabled Helix Core Server

If your Helix Core Server is SSL enabled, add the ssl protocol to the p4port field and provide the trusted fingerprint in the p4trust field:

com.perforce.p4search.core.p4port=ssl:perforce.com:1666
com.perforce.p4search.core.p4trust=59:75:62:6C:4F:C9:53:F5:4A:30:90:FF:C9:60:01:10:C7:D0:ED:1F

Specify the Perforce index user

Optional: A Standard user with super access. Used to read content from the Helix Core Server, to set file attributes, and to install extensions. For instructions on creating the Perforce Index User and printing out the ticket, see Create the index user . If the index user is not defined, the Perforce Service user is used.

For example:

com.perforce.p4search.core.index.p4user=index
com.perforce.p4search.core.index.p4ticket=AEEB1208CB06479B022D97C2784EEFDA

Specify allowed services

If you are using Helix Search with another service such as Swarm and are using Helix Core Server security level 5 and above, provide a list of trusted IP addresses for the services.

For example:

com.perforce.p4search.core.allowed=swarm.perforce.com,proxy.perforce.com

Set the retry interval

The retry interval when locating Elasticsearch and Perforce Services, defaults to 30000 milliseconds.

com.perforce.p4search.service.startup-retry=30000

Set the flag to automatically update extensions

A flag to automatically update extensions if they are out of date during start up, defaults to true.

com.perforce.p4search.core.extension.auto-update=true

Configure the index

Helix Search indexes Helix Core Server data when called with the REST API, see Index endpoints in Swagger Helix Search REST API.

The following configuration options determine the behavior of the indexer.

Set the index token

The index token, defaults to 00000000-0000-0000-0000-000000000000.

com.perforce.p4search.index.token=00000000-0000-0000-0000-000000000000

Set the maximum threads used for indexing

The number of threads Helix Search uses when indexing Helix Core Server changelists, defaults to 8 threads.

com.perforce.p4search.index.threads=8

Set the maximum size of files that are indexed

The max file size (in bytes) Helix Search will try and index it's content, defaults to 10MB.

com.perforce.p4search.tika.maxfilesize=104857600

Set the Helix Search timeout

The timeout used by Helix Search, defaults to 20000 milliseconds.

com.perforce.p4search.tika.timeout=20000

Set the maximum number of changes per single bulk request

The number of changes in a single bulk request to Elasticsearch, defaults to 100 changes.

com.perforce.p4search.index.bulksize=100

Set the trusted IP addresses for Helix Search

A comma separated list of IP addresses that are trusted by Helix Search.

com.perforce.p4search.core.allowed=

Set thumbnail/preview generation

Set to true to generate the blur and thumbnail/preview images when indexing is run, defaults to true.

  • Blur images are stored on the Helix Core Server as an attribute of the original image file and indexed in Elasticsearch. They consist of a 4 pixel image stored as a Hash String that loads extremely quickly. Helix DAM displays the blur images for search queries while the thumbnail/preview images are loaded.

  • Thumbnails/previews are stored on the Helix Core Server as an attribute of the original image files. They are up to 240 x 240 pixels depending on the original image's aspect ratio. P4V and Helix DAM use the thumbnails for image preview.

com.perforce.p4search.index.thumbnail=true

Set thumbnail/preview image size

The image thumbnail/preview size in pixels, defaults to 1440 pixels.

com.perforce.p4search.index.preview-size

Set thumbnail/preview image type

The image thumbnail/preview type, defaults to jpg file type.

com.perforce.p4search.index.preview-ext

Set Tesseract OCR

Tesseract Optical Character Recognition (OCR), by default it is disabled.

com.perforce.p4search.index.tesseract

Set journal path

The location where the live index journals are stored. This location is relative to the Helix Search install location.

com.perforce.p4search.index.journal-path=./jnl

Set journal rebuild threshold

Rebuilds the journal when the completed tasks exceed the threshold, defaults to 1000 tasks.

com.perforce.p4search.index.journal-threshold=1000

Configure filters for search results

User query results can be filtered by Helix Search. The following configuration options determine the behavior of the Helix Search filtering.

Filter search results for restricted changelists

To filter restricted changelists set the restricted field to true (this is the default and is recommended).

com.perforce.p4search.index.restricted=true

Filter search results by Helix Core Server user permissions

To filter user search results based on their Helix Core Server permissions (output of the p4 protect command), set the permission.required field to true. This requires the installation of the Helix Core Server p4search-filter plugin for Elasticsearch, see the Elasticsearch plugin documentation. Default is true.

com.perforce.p4search.elastic.filter=true

Use of the Helix Core Server p4search-filter plugin will introduce a small overhead in the query time. This is mainly determined by the number of protection entries that are applied to the user running the query. In some situations you might want to disable the filter if Helix Search only contains indexed data that is visible to all users.

Configure Elasticsearch

Tip

For more information on the Elasticsearch plugin, see the Elasticsearch plugin documentation.

Default configuration:

com.perforce.p4search.elastic.hosts=http://localhost:9200

com.perforce.p4search.index.name=perforce1

com.perforce.p4search.elastic.user=

com.perforce.p4search.elastic.pass=

com.perforce.p4search.elastic.tracktotalhits=10000

com.perforce.p4search.elastic.insecure=

com.perforce.p4search.index.tags=

com.perforce.p4search.index.bulksize=100

Configure Elasticsearch hosts

  1. Provide the configuration with a comma separated list of Elasticsearch hosts, for example:
  2. com.perforce.p4search.elastic.hosts=http://localhost:9200,http://localhost:9201

  3. Helix Search needs access to Elasticsearch to create and update the alias index for the Helix Core Server data. Helix Search now creates partitioned index. For more information about the partitioned index, see Large site deployment.
  4. The name of the index is configured using:

    com.perforce.p4search.index.name=perforce1

    Note

    If you are using multiple instances of Helix Search ensure that you use a different index name for each instance.

Configure Elasticsearch authentication

If your Elasticsearch instance requires authentication, add to the following empty fields to the configuration, default.

com.perforce.p4search.elastic.user=
com.perforce.p4search.elastic.pass=

Tune Performance

To tune performance:

  • Set a limit to the number of results processed by a single query, default is 10000.

  • com.perforce.p4search.elastic.tracktotalhits=10000

  • Set the Maximum Elasticsearch batch size used when indexing data, default is 100.

  • com.perforce.p4search.index.bulksize=100

Set the index tags

For use with Helix DAM only: Set the index tags as a comma separated list, by default there are no tags. Any tag that begins with an index tag value in the list is indexed. The index tags are not case sensitive.

com.perforce.p4search.index.tags=

For example:

com.perforce.p4search.index.tags=DAM,FOO

If DAM and FOO are index tags, Helix Search will index any tag that begins with DAM or FOO. In this example, the DAM_TAG_bar, DAM_tag_lorry, and FOO_bar tags would all be indexed but a tag called TAG_DAM would not be indexed.

Configure self-signed certificate

To allow Helix Search to work with a self-signed Elasticsearch certificate, by default, set to true.

com.perforce.p4search.elastic.insecure=true

If you wish to trust a different certificate then set com.perforce.p4search.elastic.insecure=false and use the following command:

/opt/perforce/helix-p4search/jre/bin/keytool -importcert -alias elasticCA -keystore /opt/perforce/helix-p4search/jre/lib/security/cacerts -storepass changeit -file <location of your Elasticsearch certificate> -noprompt

where,

file <location of your Elasticsearch certificate> is located on the Elasticsearch host: /etc/elasticsearch/certs/http_ca.crt

Configure auto-tagging of images

Tip

Supported auto-tagging services for Helix Search are:

  • Azure Tags

  • AWS Rekognition

  • GoogleLabel

  • DeepDetect

For information about installing your auto-tagging service, see your auto-tagging service provider.

Helix Search can be configured to auto-detect tags for your image files.

Default configuration:

Note

The com.perforce.p4search.auto-detect.service configurable is only used for DeepDetect.

By default, auto-tagging of images is disabled.

com.perforce.p4search.auto-detect.model=

com.perforce.p4search.auto-detect.host=

com.perforce.p4search.auto-detect.lang=

com.perforce.p4search.auto-detect.key=

com.perforce.p4search.auto-detect.threshold=

com.perforce.p4search.auto-detect.best=

com.perforce.p4search.auto-detect.service=

Configure the auto-tagging service for Azure tags

  1. Set the auto-tagging model to AzureTagsModel, for example:

  2. com.perforce.p4search.auto-detect.model=AzureTagsModel

  3. Specify the auto-tagging service hostname, for example:

  4. com.perforce.p4search.auto-detect.host=https://my.cognitiveservices.azure.com

  5. Specify the language you want the image tags to be generated in, see your auto-tagging service documentation for supported languages. For example:

  6. com.perforce.p4search.auto-detect.lang=en

  7. Enter the API key for your auto-tagging service, for example:

  8. com.perforce.p4search.auto-detect.key=0123456789ABCDEF0123456789ABCDEF

  9. Specify the automatic image detection threshold for your auto-tagging service as a floating point percentage. For example, 0.1=10%:

  10. com.perforce.p4search.auto-detect.threshold=0.1

  11. Specify the automatic best image detection results limit, defaults to 10 results.

  12. com.perforce.p4search.auto-detect.best=10

Example auto-tagging configuration for Azure tags:

com.perforce.p4search.auto-detect.model=AzureTagsModel
com.perforce.p4search.auto-detect.host=https://my.cognitiveservices.azure.com
com.perforce.p4search.auto-detect.lang=en
com.perforce.p4search.auto-detect.key=0123456789ABCDEF0123456789ABCDEF
com.perforce.p4search.auto-detect.threshold=0.1
com.perforce.p4search.auto-detect.best=10

Configure the auto-tagging service for AWS Rekognition

  1. Set the auto-tagging model to RekognitionLabelsModel, for example:

  2. com.perforce.p4search.auto-detect.model=RekognitionLabelsModel

  3. Specify the auto-tagging service region as host, for example:

  4. com.perforce.p4search.auto-detect.host=us-east-2

  5. Specify the language you want the image tags to be generated in, see your auto-tagging service documentation for supported languages. For example:

  6. com.perforce.p4search.auto-detect.lang=en

  7. Enter the API key for your auto-tagging service. The API key for the AWS Rekognition is a combination of <aws_access_key_id> and <aws_secret_access_key>. For example,

  8. com.perforce.p4search.auto-detect.key=<aws_access_key_id>:<aws_secret_access_key>

    com.perforce.p4search.auto-detect.key=ABCDEFGHIJKL12345678:ab0cd1ef2gh3IJ4aaaBBBcccWWW111rfc1234YERTpp

    where,

    ws_access_key_id=ABCDEFGHIJKL12345678

    ws_secret_access_key=ab0cd1ef2gh3IJ4aaaBBBcccWWW111rfc1234YERTpp

  9. Specify the automatic image detection threshold for your auto-tagging service as a floating point percentage. For example, 0.1=10%:

  10. com.perforce.p4search.auto-detect.threshold=0.1

  11. Specify the automatic best image detection results limit, defaults to 10 results.

  12. com.perforce.p4search.auto-detect.best=10

Example auto-tagging configuration for AWS Rekognition:

com.perforce.p4search.auto-detect.model=RekognitionLabelsModel
com.perforce.p4search.auto-detect.host=us-east-2
com.perforce.p4search.auto-detect.lang=en
com.perforce.p4search.auto-detect.key=ABCDEFGHIJKL12345678:ab0cd1ef2gh3IJ4aaaBBBcccWWW111rfc1234YERTpp
com.perforce.p4search.auto-detect.threshold=0.1
com.perforce.p4search.auto-detect.best=10

Configure the auto-tagging service for GoogleLabel

  1. Set the auto-tagging model to GoogleLabelModel, for example,

  2. com.perforce.p4search.auto-detect.model=GoogleLabelModel

  3. Specify the auto-tagging service hostname, for example:

  4. com.perforce.p4search.auto-detect.host=https://vision.googleapis.com

  5. Specify the language you want the image tags to be generated in, see your auto-tagging service documentation for supported languages. For example:

  6. com.perforce.p4search.auto-detect.lang=en

  7. Enter the API key for your auto-tagging service, for example:

  8. com.perforce.p4search.auto-detect.key=AbcdEFG12345ZXvfe56210QWErtyui123456789

  9. Specify the automatic image detection threshold for your auto-tagging service as a floating point percentage. For example, 0.1=10%:

  10. com.perforce.p4search.auto-detect.threshold=0.1

  11. Specify the automatic best image detection results limit, defaults to 10 results.

  12. com.perforce.p4search.auto-detect.best=10

Example auto-tagging configuration for GoogleLabel:

com.perforce.p4search.auto-detect.model=GoogleLabelModel
com.perforce.p4search.auto-detect.host=https://vision.googleapis.com
com.perforce.p4search.auto-detect.lang=en
com.perforce.p4search.auto-detect.key=AbcdEFG12345ZXvfe56210QWErtyui123456789
com.perforce.p4search.auto-detect.threshold=0.1
com.perforce.p4search.auto-detect.best=10

Configure the auto-tagging service for DeepDetect

  1. Set the auto-tagging model to DeepDetectModel:

  2. com.perforce.p4search.auto-detect.model=DeepDetectModel

  3. Specify the auto-tagging service hostname, for example:

  4. com.perforce.p4search.auto-detect.host=https://localhost:8888

  5. Specify the language you want the image tags to be generated in, see your auto-tagging service documentation for supported languages.

    DeepDetect does not support multiple languages, leave empty for DeepDetect:

  6. com.perforce.p4search.auto-detect.lang=

  7. Enter the API key for your auto-tagging service.

    DeepDetect does not support an API key, leave empty for DeepDetect:

  8. com.perforce.p4search.auto-detect.key=

  9. Specify the automatic image detection threshold for your auto-tagging service as a floating point percentage. For example, 0.1=10%:

  10. com.perforce.p4search.auto-detect.threshold=0.1

  11. Specify the automatic best image detection results limit, defaults to 10 results.

  12. com.perforce.p4search.auto-detect.best=10

  13. Specify the classification service you want to use for image tags. For example, classification_21k if you are using DeepDetect 21k pre-trained model:
  14. com.perforce.p4search.auto-detect.service=classification_21k

Example auto-tagging configuration for DeepDetect:

com.perforce.p4search.auto-detect.model=DeepDetectModel
com.perforce.p4search.auto-detect.host=http://localhost:8888
com.perforce.p4search.auto-detect.lang=
com.perforce.p4search.auto-detect.key=
com.perforce.p4search.auto-detect.threshold=0.1
com.perforce.p4search.auto-detect.best=10
com.perforce.p4search.auto-detect.service=classification_21k
Note

For more information about using a pre-trained model, see Getting the pre-trained model section of the Image classifier topic.

Example curl request to load the DeepDetect 21k pre-trained model:

curl -H "Content-Type: application/x-www-form-urlencoded" -X PUT
 -d '{
 "description": "generic image classification service",
 "model": {
  "repository": "/opt/models/classification_21k",
  "init":"https://deepdetect.com/models/init/desktop/images/classification/classification_21k.tar.gz",
  "create_repository": true
 },
 "mllib": "caffe",
 "type": "supervised",
 "parameters": {
  "input": {
  "connector": "image"
  }
 }
}' http://localhost:8888/services/classification_21k

Configure image OCR

Tip

Supported image OCR models for Helix Search are:

  • AzureOcrModel

  • RekognitionOcrModel

  • GoogleOcrModel

Helix Search can be configured to enable image Optical Character Recognition (OCR) by specifying the OCR model for your image files.

Default configuration:

By default image OCR is disabled.

com.perforce.p4search.auto-ocr.model=

com.perforce.p4search.auto-detect.host=

com.perforce.p4search.auto-detect.lang=

com.perforce.p4search.auto-detect.key=

Note

The auto-ocr model uses the auto-tagging host, language, and key to configure the auto-ocr service.

Configure the auto-ocr service for AzureOcrModel

  1. Set the auto-ocr model to AzureOcrModel, for example:

  2. com.perforce.p4search.auto-ocr.model=AzureOcrModel

  3. Specify the auto-tagging service hostname, for example:

  4. com.perforce.p4search.auto-detect.host=https://my.cognitiveservices.azure.com

  5. Specify the language you want the image tags to be generated in, see your auto-tagging service documentation for supported languages. For example:

  6. com.perforce.p4search.auto-detect.lang=en

  7. Enter the API key for your auto-tagging service.

  8. com.perforce.p4search.auto-detect.key=0123456789ABCDEF0123456789ABCDEF

Example auto-ocr configuration for AzureOcrModel:

com.perforce.p4search.auto-ocr.model=AzureOcrModel
com.perforce.p4search.auto-detect.host=https://my.cognitiveservices.azure.com
com.perforce.p4search.auto-detect.lang=en
com.perforce.p4search.auto-detect.key=0123456789ABCDEF0123456789ABCDEF

Configure the auto-ocr service for RekognitionOcrModel

  1. Set the auto-ocr model to RekognitionOcrModel, for example:

  2. com.perforce.p4search.auto-ocr.model=RekognitionOcrModel

  3. Specify the auto-tagging service region as host, for example:

  4. com.perforce.p4search.auto-detect.host=us-east-2

  5. Specify the language you want the image tags to be generated in, see your auto-tagging service documentation for supported languages. For example:

  6. com.perforce.p4search.auto-detect.lang=en

  7. Enter the API key for your auto-tagging service. The API key for the RekognitionOcrModel is a combination of <aws_access_key_id> and <aws_secret_access_key>. For example,

  8. com.perforce.p4search.auto-detect.key=<aws_access_key_id>:<aws_secret_access_key>

    com.perforce.p4search.auto-detect.key=ABCDEFGHIJKL12345678:ab0cd1ef2gh3IJ4aaaBBBcccWWW111rfc1234YERTpp

    where,

    ws_access_key_id=ABCDEFGHIJKL12345678

    ws_secret_access_key=ab0cd1ef2gh3IJ4aaaBBBcccWWW111rfc1234YERTpp

Example auto-ocr configuration for RekognitionOcrModel:

com.perforce.p4search.auto-ocr.model=RekognitionOcrModel
com.perforce.p4search.auto-detect.host=us-east-2
com.perforce.p4search.auto-detect.lang=en
com.perforce.p4search.auto-detect.key=ABCDEFGHIJKL12345678:ab0cd1ef2gh3IJ4aaaBBBcccWWW111rfc1234YERTpp

Configure the auto-ocr service for GoogleOcrModel

  1. Set the auto-ocr model to GoogleOcrModel, for example:

  2. com.perforce.p4search.auto-ocr.model=GoogleOcrModel

  3. Specify the auto-tagging service hostname, for example:

  4. com.perforce.p4search.auto-detect.host=https://vision.googleapis.com

  5. Specify the language you want the image tags to be generated in, see your auto-tagging service documentation for supported languages. For example:

  6. com.perforce.p4search.auto-detect.lang=en

  7. Enter the API key for your auto-tagging service, for example:

  8. com.perforce.p4search.auto-detect.key=AbcdEFG12345ZXvfe56210QWErtyui123456789

Example auto-ocr configuration for GoogleOcrModel:

com.perforce.p4search.auto-ocr.model=GoogleOcrModel
com.perforce.p4search.auto-detect.host=https://vision.googleapis.com
com.perforce.p4search.auto-detect.lang=en
com.perforce.p4search.auto-detect.key=AbcdEFG12345ZXvfe56210QWErtyui123456789

Configure speech-to-text recognition

Important
  • The Speech-to-text recognition can be used for audio and video files.

  • All audio and video files up to 60 seconds are referred to as short audios.

  • Audio or video files longer than 60 seconds are only supported for the AmazonTranscribeModel.

  • For all audio and video files up to 60 seconds, the audio is transcribed synchronously. Files longer than 60 seconds are skipped unless you are using the AmazonTranscribeModel.

Tip

Supported speech to text models for Helix Search are:

  • AzureSpeechModel

  • AmazonTranscribeModel

  • GoogleSpeechModel

  • WhisperSpeechModel

Helix Search can be configured to enable speech to text transcription by specifying the automatic speech recognition model.

Default configuration:

By default speech to text is disabled.

com.perforce.p4search.auto-speech.model=

com.perforce.p4search.auto-speech.host=

com.perforce.p4search.auto-speech.lang=

com.perforce.p4search.auto-speech.key=

com.perforce.p4search.auto-speech.threshold=

com.perforce.p4search.auto-speech.short-audio=

com.perforce.p4search.auto-speech.bucket= (only available for AmazonTranscribeModel)

com.perforce.p4search.auto-speech.timeout=

Limitations

The Speech-to-text recognition has the following limitations:

  • AmazonTranscribeModel does not transcribe a mp4 file of 48000 sample rate.

  • AzureSpeechModel and GoogleSpeechModel can transcribe only up to 60 seconds of audio and video files.

Configure the auto-speech recognition service for AzureSpeechModel

  1. Set the auto-speech model to AzureSpeechModel.

  2. com.perforce.p4search.auto-speech.model=AzureSpeechModel

  3. Specify the auto-speech recognition service hostname, for example:

  4. com.perforce.p4search.auto-speech.host=https://eastus.stt.speech.microsoft.com

  5. Specify the language of the audio you want to be transcribed, for example:

  6. com.perforce.p4search.auto-speech.lang=en-US

  7. Enter the API key for your auto-speech recognition service, for example:

  8. com.perforce.p4search.auto-speech.key=0123456789ABCDEF0123456789ABCDEF

  9. Set the automatic speech recognition threshold for your auto-speech service as a floating point percentage. The automatic speech recognition threshold is the accuracy returned by your auto-speech service. For example, 0.6=60%.

    In this example when auto-speech threshold is set to 0.6, any transcription with 60% accuracy is accepted by the auto-speech service and anything below that is discarded.

  10. com.perforce.p4search.auto-speech.threshold=0.6

  11. Set the short audio transcribe limit in seconds, for example:

  12. com.perforce.p4search.auto-speech.short-audio=60

  13. Specify the name of the cloud storage bucket or container where the large speech files will be stored.

  14. com.perforce.p4search.auto-speech.bucket=cloud_storage_bucket_name

  15. Specify the timeout limit for short audio transcription processing from the cloud services in seconds, for example:

  16. com.perforce.p4search.auto-speech.timeout=1200

Example auto-speech recognition configuration for AzureSpeechModel:

com.perforce.p4search.auto-speech.model=AzureSpeechModel
com.perforce.p4search.auto-speech.host=https://eastus.stt.speech.microsoft.com
com.perforce.p4search.auto-speech.lang=en-US
com.perforce.p4search.auto-speech.key=0123456789ABCDEF0123456789ABCDEF
com.perforce.p4search.auto-speech.threshold=0.6
com.perforce.p4search.auto-speech.short-audio=60
com.perforce.p4search.auto-speech.bucket=cloud_storage_bucket_name
com.perforce.p4search.auto-speech.timeout=1200

Configure the auto-speech recognition service for AmazonTranscribeModel

  1. Set the auto-speech model to AmazonTranscribeModel.

  2. com.perforce.p4search.auto-speech.model=AmazonTranscribeModel

  3. Specify the auto-speech recognition service hostname, for example:

  4. com.perforce.p4search.auto-speech.host=us-east-2

  5. Specify the language of the audio you want to be transcribed, for example:

  6. com.perforce.p4search.auto-speech.lang=en-US

  7. Enter the API key for your auto-speech recognition service. The API key for the AmazonTranscribeModel is a combination of <aws_access_key_id> and <aws_secret_access_key>. For example:

  8. com.perforce.p4search.auto-speech.key=<aws_access_key_id>:<aws_secret_access_key>

    com.perforce.p4search.auto-speech.key=ABCDEFGHIJKL12345678:ab0cd1ef2gh3IJ4aaaBBBcccWWW111rfc1234YERTpp

    where,

    ws_access_key_id=ABCDEFGHIJKL12345678

    ws_secret_access_key=ab0cd1ef2gh3IJ4aaaBBBcccWWW111rfc1234YERTpp

  9. Set the automatic speech recognition threshold for your auto-speech service as a floating point percentage. The automatic speech recognition threshold is the accuracy returned by your auto-speech service. For example, 0.6=60%.

    In this example when auto-speech threshold is set to 0.6, any transcription with 60% accuracy is accepted by the auto-speech service and anything below that is discarded.

  10. com.perforce.p4search.auto-speech.threshold=0.6

  11. Set the short audio transcribe limit in seconds, for example:

  12. com.perforce.p4search.auto-speech.short-audio=60

  13. Specify the name of the cloud storage bucket or container where the large speech files are stored.

  14. com.perforce.p4search.auto-speech.bucket=cloud_storage_bucket_name

  15. Specify the timeout limit for short audio transcription processing from the cloud services in seconds, for example:

  16. com.perforce.p4search.auto-speech.timeout=1200

Example auto-speech recognition configuration for AmazonTranscribeModel:

com.perforce.p4search.auto-speech.model=AmazonTranscribeModel
com.perforce.p4search.auto-speech.host=us-east-2
com.perforce.p4search.auto-speech.lang=en-US
com.perforce.p4search.auto-speech.key=ABCDEFGHIJKL12345678:ab0cd1ef2gh3IJ4aaaBBBcccWWW111rfc1234YERTpp
com.perforce.p4search.auto-speech.threshold=0.6
com.perforce.p4search.auto-speech.short-audio=60
com.perforce.p4search.auto-speech.bucket=cloud_storage_bucket_name
com.perforce.p4search.auto-speech.timeout=1200

Configure the auto-speech recognition service for GoogleSpeechModel

  1. Set the auto-speech model to GoogleSpeechModel.

  2. com.perforce.p4search.auto-speech.model=GoogleSpeechModel

  3. Specify the auto-speech recognition service hostname, for example:

  4. com.perforce.p4search.auto-speech.host=https://speech.googleapis.com

  5. Specify the language of the audio you want to be transcribed, for example:

  6. com.perforce.p4search.auto-speech.lang=en

  7. Enter the API key for your auto-speech recognition service, for example:

  8. com.perforce.p4search.auto-speech.key=AbcdEFG12345ZXvfe56210QWErtyui123456789

  9. Set the automatic speech recognition threshold for your auto-speech service as a floating point percentage. The automatic speech recognition threshold is the accuracy returned by your auto-speech service. For example, 0.6=60%.

    In this example when auto-speech threshold is set to 0.6, any transcription with 60% accuracy is accepted by the auto-speech service and anything below that is discarded.

  10. com.perforce.p4search.auto-speech.threshold=0.6

  11. Set the short audio transcribe limit in seconds, for example:

  12. com.perforce.p4search.auto-speech.short-audio=60

  13. Specify the timeout limit for short audio transcription processing from the cloud services in seconds, for example:

  14. com.perforce.p4search.auto-speech.timeout=1200

Example auto-speech recognition configuration for GoogleSpeechModel:

com.perforce.p4search.auto-speech.model=GoogleSpeechModel
com.perforce.p4search.auto-speech.host=https://speech.googleapis.com
com.perforce.p4search.auto-speech.lang=en
com.perforce.p4search.auto-speech.key=AbcdEFG12345ZXvfe56210QWErtyui123456789
com.perforce.p4search.auto-speech.threshold=0.6
com.perforce.p4search.auto-speech.short-audio=60
com.perforce.p4search.auto-speech.timeout=1200

Configure the auto-speech recognition service for WhisperSpeechModel

  1. Run Whisper as a service in a docker container. Here is an example of a docker container:

  2. https://hub.docker.com/r/onerahmet/openai-whisper-asr-webservice

  3. Once the docker container is ready, set the auto-speech model to WhisperSpeechModel, for example:

  4. com.perforce.p4search.auto-speech.model=WhisperSpeechModel

  5. Specify the auto-speech recognition service hostname using the Whisper docker container hostname and port number, for example:

  6. com.perforce.p4search.auto-speech.host=http://localhost:9000

  7. (Optional) Whisper has a built-in language detection but you can optionally specify the language of your audio and video files, for example:

  8. com.perforce.p4search.auto-speech.lang=en

  9. Set the short audio transcribe limit in seconds, for example:

  10. com.perforce.p4search.auto-speech.short-audio=60

  11. Specify the name of the cloud storage bucket or container where the large speech files are stored.

  12. com.perforce.p4search.auto-speech.bucket=cloud_storage_bucket_name

  13. Specify the timeout limit for short audio transcription processing from the cloud services in seconds, for example:

  14. com.perforce.p4search.auto-speech.timeout=1200

Example auto-speech recognition configuration for Whisper:

com.perforce.p4search.auto-speech.model=WhisperSpeechModel
com.perforce.p4search.auto-speech.host=http://localhost:9000
com.perforce.p4search.auto-speech.lang=en
com.perforce.p4search.auto-speech.short-audio=60
com.perforce.p4search.auto-speech.timeout=1200

Rendering configuration

Default configuration:

com.perforce.p4search.render.service=

com.perforce.p4search.render.host=http://localhost:1602

com.perforce.p4search.render.key=

com.perforce.p4search.render.model=

Set the render service

The render service implementation you want Helix Search to use to render images, defaults to empty.

com.perforce.p4search.render.service=LocalRenderService

Set the rendering host

Optional: The rendering service host IP address.

com.perforce.p4search.render.host=http://localhost:1602

Set the rendering API key

The Rendering service API security key for use with AWSRenderService. The LocalRenderService uses the X-Auth-Token configured in Set the index token.

com.perforce.p4search.render.key=Rendering service API security key

Set 3D model type for rendering service

The 3D model type for rendering service. For example, GLB.

com.perforce.p4search.render.model=GLB

Check service status

To check the status of your Helix Search service, navigate to the status page using the connection details you set for the Helix Search service during the installation.

For example:

http://myhelixsearch.mydomain.com:1601/p4search/settings/status

Tip

You might need to log in to view the Service status page.

Image of the Helix Search web status page

Service status page fails to open

The Helix Search service requires other services to be available. If any of the required services are not available, the service status page URL will return a 404 error with the following message:

Endpoint not mapped: GET /p4search/settings/status

To determine the cause of the failure, check the Helix Search service log file. The default log file is:

<search install dir>/log/p4search.log

Correct the cause of the failure. You might need to update the etc/config.properties file and restart the Helix Search service to correct the problem.

Related topics