Skip to content

redact­_pii­_policies and speech­_understanding are incompatible. #146

@HectorPulido

Description

@HectorPulido

I'm having issues with both of them going together
PIIL is good if I set off the speech­_understanding

import requests
import time

base_url = "https://api.assemblyai.com"

headers = {
    # Replace with your chosen API key, this is the "default" account api key
    "authorization": "<>"
}

# URL of the file to transcribe
FILE_URL = "https://cdn.assemblyai.com/upload/<>"

# You can set additional parameters for the transcription
config = {
  "audio_url": FILE_URL,
  "redact_pii":True,
  "speaker_labels":True,
  "format_text":True,
  "punctuate":True,
  "speech_model":"universal",
  "language_detection":True
}
config["redact_pii_policies"] = [
  "medical_condition",
  "email_address",
  "phone_number",
  "banking_information",
  "credit_card_number",
  "credit_card_cvv",
  "date_of_birth",
  "person_name"
]

url = base_url + "/v2/transcript"
response = requests.post(url, json=config, headers=headers)

transcript_id = response.json()['id']
polling_endpoint = base_url + "/v2/transcript/" + transcript_id

while True:
  transcription_result = requests.get(polling_endpoint, headers=headers).json()
  transcription_text = transcription_result['text']

  if transcription_result['status'] == 'completed':
    print(f"Transcript Text:", transcription_text)
    break

  elif transcription_result['status'] == 'error':
    raise RuntimeError(f"Transcription failed: {transcription_result['error']}")

  else:
    time.sleep(3)
Image
import requests
import time

base_url = "https://api.assemblyai.com"

headers = {
    # Replace with your chosen API key, this is the "default" account api key
    "authorization": "<>"
}

# URL of the file to transcribe
FILE_URL = "https://cdn.assemblyai.com/upload/<>"

# You can set additional parameters for the transcription
config = {
  "audio_url": FILE_URL,
  "redact_pii":True,
  "speaker_labels":True,
  "format_text":True,
  "punctuate":True,
  "speech_model":"universal",
  "language_detection":True
}
config["speech_understanding"] = {
  "request": {
    "speaker_identification": {
      "speaker_type": "role",
      "known_values": [
        "Agent",
        "Customer"
      ]
    }
  }
}
config["redact_pii_policies"] = [
  "medical_condition",
  "email_address",
  "phone_number",
  "banking_information",
  "credit_card_number",
  "credit_card_cvv",
  "date_of_birth",
  "person_name"
]

url = base_url + "/v2/transcript"
response = requests.post(url, json=config, headers=headers)

transcript_id = response.json()['id']
polling_endpoint = base_url + "/v2/transcript/" + transcript_id

while True:
  transcription_result = requests.get(polling_endpoint, headers=headers).json()
  transcription_text = transcription_result['text']

  if transcription_result['status'] == 'completed':
    print(f"Transcript Text:", transcription_text)
    break

  elif transcription_result['status'] == 'error':
    raise RuntimeError(f"Transcription failed: {transcription_result['error']}")

  else:
    time.sleep(3)

But when I add speaker identification it brokes the PIIL

Image

(data is fake, don't worry)
I apologize for putting the issue here, I didn't know where to put it

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions