Implementation Options

Original SSML

You say, pecan. I say, pecan.

Markup:


<speak>
  You say, <phoneme alphabet="ipa" ph="pɪˈkɑːn">pecan</phoneme>. 
  I say, <phoneme alphabet="ipa" ph="ˈpi.kæn">pecan</phoneme>.
</speak>

The JSON AST (just the phonemes) as HTML

[
  {
    "type": "element",
    "tagName": "phoneme",
    "attributes": [
      {
        "key": "alphabet",
        "value": "ipa"
      },
      {
        "key": "ph",
        "value": "pɪˈkɑːn"
      }
    ],
    "children": [
      {
        "type": "text",
        "content": "pecan"
      }
    ]
  },
  {
    "type": "element",
    "tagName": "phoneme",
    "attributes": [
      {
        "key": "alphabet",
        "value": "ipa"
      },
      {
        "key": "ph",
        "value": "ˈpi.kæn"
      }
    ],
    "children": [
      {
        "type": "text",
        "content": "pecan"
      }
    ]
  }
]

ARIA

I say pecan. You say pecan.

Markup:


I say <span aria-ssml="{&quot;phoneme&quot;:{&quot;ph&quot;:&quot;pɪˈkɑːn&quot;,&quot;alphabet&quot;:&quot;ipa&quot;}}">pecan</span>.
You say <span aria-ssml="{&quot;phoneme&quot;:{&quot;ph&quot;:&quot;ˈpi.kæn&quot;,&quot;alphabet&quot;:&quot;ipa&quot;}}">pecan</span>.

How it's parsed:

JSON.parse(element.getAttribute('aria-ssml'))

Ruby + Microdata

You say, pecan pɪˈkɑːn . I say, pe ˈpi can kæn .

Markup:


<p>
  You say, 
  <span itemscope="" itemtype="http://example.org/Pronunciation">
    <ruby itemprop="phoneme" content="pecan">
      pecan
      <rt itemprop="ph">pɪˈkɑːn</rt>
      <meta itemprop="alphabet" content="ipa">
    </ruby>.
  </span>
  I say,
  <span itemscope="" itemtype="http://example.org/Pronunciation">
    <ruby itemprop="phoneme" content="pecan">
      pe
      <rt itemprop="ph">ˈpi</rt>
      can
      <rt itemprop="ph">kæn</rt>
      <meta itemprop="alphabet" content="ipa">
    </ruby>.
  </span>
</p>

Ruby + Microdata extracted/parsed to JSON

How it's parsed:

{
  "items": [
    {
      "properties": {
        "phoneme": [
          "pecan"
        ],
        "ph": [
          "pɪˈkɑːn"
        ],
        "alphabet": [
          "ipa"
        ]
      },
      "type": [
        "http://example.org/Pronunciation"
      ]
    },
    {
      "properties": {
        "phoneme": [
          "pecan"
        ],
        "ph": [
          "ˈpi",
          "kæn"
        ],
        "alphabet": [
          "ipa"
        ]
      },
      "type": [
        "http://example.org/Pronunciation"
      ]
    }
  ]
}

aria-details

You say, pecan. I say, pecan.

Markup:


<p>
  You say, <span aria-details="pecan1">pecan</span>.
  I say, <span aria-details="pecan2">pecan</span>.
</p>
<div>
  <speak>
    <phoneme id="pecan1" alphabet="ipa" ph="pɪˈkɑːn"></phoneme> 
    <phoneme id="pecan2" alphabet="ipa" ph="ˈpi.kæn"></phoneme>
  </speak>
</div>

How it's parsed:

document.getElementById(element.getAttribute('aria-details'))

HTML + Microdata

You say, pecan. I say, pecan.

Markup:


<p>
  You say, <span itemscope="" itemtype="http://example.org/Pronunciation">
    <meta itemprop="alphabet" content="ipa">
    <meta itemprop="ph" content="pɪˈkɑːn">
    <span itemprop="phoneme">pecan</span></span>.
  I say, <span itemscope="" itemtype="http://example.org/Pronunciation">
    <meta itemprop="alphabet" content="ipa">
    <meta itemprop="ph" content="ˈpi.kæn">
    <span itemprop="phoneme">pecan</span></span>.
</p>

HTML + Microdata extracted/parsed to JSON

How it's parsed:

{
  "items": [
    {
      "properties": {
        "alphabet": [
          "ipa"
        ],
        "ph": [
          "pɪˈkɑːn"
        ],
        "phoneme": [
          "pecan"
        ]
      },
      "type": [
        "http://example.org/Pronunciation"
      ]
    },
    {
      "properties": {
        "alphabet": [
          "ipa"
        ],
        "ph": [
          "ˈpi.kæn"
        ],
        "phoneme": [
          "pecan"
        ]
      },
      "type": [
        "http://example.org/Pronunciation"
      ]
    }
  ]
}

Another example, using itemref.

You say, pecan. I say, pecan.


<p>
  <span id="pecan"><meta itemprop="alphabet" content="ipa"><meta itemprop="phoneme" content="pecan"></span>
  You say,
    <span itemscope="" itemref="pecan" itemtype="http://example.org/Pronunciation">
      <meta itemprop="ph" content="pɪˈkɑːn">pecan</span>.
  I say,
    <span itemscope="" itemref="pecan" itemtype="http://example.org/Pronunciation">
      <meta itemprop="ph" content="ˈpi.kæn">pecan</span>.
</p>

How it's parsed:

{
  "items": [
    {
      "properties": {
        "ph": [
          "pɪˈkɑːn"
        ],
        "alphabet": [
          "ipa"
        ],
        "phoneme": [
          "pecan"
        ]
      },
      "type": [
        "http://example.org/Pronunciation"
      ]
    },
    {
      "properties": {
        "ph": [
          "ˈpi.kæn"
        ],
        "alphabet": [
          "ipa"
        ],
        "phoneme": [
          "pecan"
        ]
      },
      "type": [
        "http://example.org/Pronunciation"
      ]
    }
  ]
}

JSON-LD + Microdata

You say, pecan. I say, pecan.

Markup:


<script type="application/ld+json">
  // @id is an md5 hash of the original (normalized) ssml
  // easy to cache, generate, differentiate, and predict
  // can be fully externalized
  {
    "@id": "http://example.org/Pronunciation/en_us/36ce11650d7baceb0e1877515ff33aba",
    "@context": "http://schema.org/",
    "type": "Pronunciation",
    "alphabet": "ipa",
    "phoneme ": "pecan",
    "ph": "pɪˈkɑːn"
  }
</script>
<script type="application/ld+json">
  {
    "@id": "http://example.org/Pronunciation/en_us/814f8be5715b0e423242430ee2b2b6ef",
    "@context": "http://schema.org/",
    "type": "Pronunciation",
    "alphabet": "ipa",
    "phoneme ": "pecan",
    "ph": "ˈpi.kæn"
  }
</script>
<p>
  You say, <span itemscope="" itemtype="http://example.org/Pronunciation">
    <link itemprop="url" href="hhttp://example.org/Pronunciation/en_us/36ce11650d7baceb0e1877515ff33aba">pecan</span>. 
  I say, <span itemscope="" itemtype="http://example.org/Pronunciation">
    <link itemprop="url" href="http://example.org/Pronunciation/en_us/814f8be5715b0e423242430ee2b2b6ef">pecan</span>.
</p>

How it's parsed:

{
  "items": [
    {
      "properties": {
        "url": [
          "hhttp://example.org/Pronunciation/en_us/36ce11650d7baceb0e1877515ff33aba"
        ]
      },
      "type": [
        "http://example.org/Pronunciation"
      ]
    },
    {
      "properties": {
        "url": [
          "http://example.org/Pronunciation/en_us/814f8be5715b0e423242430ee2b2b6ef"
        ]
      },
      "type": [
        "http://example.org/Pronunciation"
      ]
    }
  ]
}

HTML with external namespace SSML (similar to MathML)

You say, pecan. I say, pecan.

Markup:


<speak xmlns="http://www.w3.org/2001/10/synthesis">
  You say, <phoneme alphabet="ipa" ph="pɪˈkɑːn">pecan</phoneme>. 
  I say, <phoneme alphabet="ipa" ph="ˈpi.kæn">pecan</phoneme>.
</speak>

(microdata parsing done by microdata-node or Google's Structured Data Testing Tool)

You say, pecan. I say, pecan.