WireTextTools

Text and markup manipulation tools: HTML-to-text conversion, string truncation, placeholder replacement, diff generation, and more

Used internally by $sanitizer and throughout the ProcessWire core.

Access via $sanitizer (recommended) or directly:

$tools = $sanitizer->getTextTools(); // preferred — reuses the shared instance
$tools = new WireTextTools();        // or construct directly

$sanitizer also exposes the most common methods as direct shortcuts ($sanitizer->truncate(), $sanitizer->markupToText(), etc.), so you don't need to go through getTextTools() for everyday use.

HTML to text

markupToText($str, $options)

Convert HTML to readable plain text. More useful than strip_tags(): handles paragraph separation, list bullets, headline formatting, and entity conversion.

$text = $tools->markupToText($html);
$text = $tools->markupToText($html, ['linksToMarkdown' => true]);
$text = $tools->markupToText($html, ['keepTags' => ['em', 'strong']]);

Options:

OptionDefaultDescription
keepTags[]Tags to preserve
clearTags['script','style','object']Tags whose content is also removed (not just the tag)
splitBlocks"\n\n"Separator inserted between block elements
convertEntitiestrueConvert HTML entities to plain text
listItemPrefix'• 'Prefix for <li> items
linksToUrlstrueConvert <a href> to text (url) format
linksToMarkdownfalseConvert links to [text](url) Markdown format
uppercaseHeadlinesfalseUPPERCASE headline text
underlineHeadlinestrueUnderline headlines with = or -
collapseSpacestrueCollapse redundant whitespace
replacements['&nbsp;' => ' ']Additional string replacements to apply

collapse($str, $options)

Flatten HTML or multiline text into a single-line plain-text string.

$oneLine = $tools->collapse($multiLineText);
$oneLine = $tools->collapse($html, ['collapseLinesWith' => ' | ']);
OptionDefaultDescription
stripTagstrueStrip HTML tags
keepTags[]Tags to keep when stripping
collapseLinesWith' 'What to replace newlines/block breaks with
endBlocksWith''Marker inserted before paragraph/header breaks before collapsing
linksToUrlsfalseConvert links to (url) format
convertEntitiestrueConvert HTML entities
Truncation

truncate($str, $maxLength, $options)

Truncate a string without breaking words, with intelligent fallback through truncation types: blocksentencepunctuationword.

$s = $tools->truncate($str, 150);                          // word boundary
$s = $tools->truncate($str, 300, 'sentence');              // sentence boundary
$s = $tools->truncate($html, 200, ['visible' => true]);    // count visible chars only
$s = $tools->truncate($str, ['type' => 'block', 'maxLength' => 500, 'more' => ' […]']);

The type specifies the preferred truncation point. If a match can't be found within maxLength, it falls back to the next simpler type automatically.

OptionDefaultDescription
type'word'Preferred point: 'word', 'punctuation', 'sentence', 'block'
maxLength255Maximum character length (when options array passed as 2nd arg)
visiblefalseCount visible characters only — markup and entities don't count
maximizetrueInclude as much content as possible before the truncation point
trim',;/ 'Characters to trim from the truncated end
more'…'Appended when string is truncated without ending in punctuation
keepTags[]HTML tags to preserve in the result
keepFormatTagsfalseKeep inline formatting tags (em, strong, span, etc.)
collapseLinesWith' … 'String used to collapse line breaks
convertEntitiesfalseConvert HTML entities
noEndSentence'Mr. Mrs. ...'Space-separated list of words that don't end a sentence

The visible option is particularly useful when truncating HTML — without it, markup characters count toward maxLength even though they're not visible to readers.

Placeholder replacement

populatePlaceholders($str, $vars, $options)

Replace {placeholder} tags with values. When $vars is a Page, subfield and OR-tag syntax are supported.

// Array vars
$result = $tools->populatePlaceholders('Hello {first_name}!', ['first_name' => 'Ryan']);

// Page vars — subfields and OR tags work
$result = $tools->populatePlaceholders('{title} by {author.name}', $page);
$result = $tools->populatePlaceholders('{display_name|title|name}', $page); // OR: first non-empty

{field1|field2|field3} with a Page object returns the first non-empty value among the listed fields — useful as a fallback chain.

OptionDefaultDescription
tagOpen'{'Opening tag character(s)
tagClose'}'Closing tag character(s)
recursivefalseIf a replacement value itself contains tags, populate those too
removeNullTagstrueRemove tags that resolve to null (field not present on object)
removeEmptyTagstrueRemove tags that resolve to empty string, false, or null
entityEncodefalseEntity-encode replacement values
entityDecodefalseEntity-decode replacement values
allowMarkuptrueAllow HTML in replacement values (uses getMarkup() on Pages)

When $vars is a Page and allowMarkup is true, $page->getMarkup($field) is called (formatted output). Use 'allowMarkup' => false to get $page->getText($field).

findPlaceholders($str, $options)

Find all {placeholder} tags in a string.

$tags = $tools->findPlaceholders('Hello {name}, welcome to {site}');
// ['name' => '{name}', 'site' => '{site}']

$has = $tools->findPlaceholders($str, ['has' => true]); // bool

hasPlaceholders($str)

Returns true if the string contains any {placeholder} tags.

Visible length

getVisibleLength($str)

Count visible characters, excluding markup tags and HTML entities.

$len = $tools->getVisibleLength('Hello <strong>world</strong>'); // 11
$len = $tools->getVisibleLength('Price: &pound;10');             // 10
Diff

diffMarkup($old, $new, $options)

Generate an HTML diff showing insertions and deletions between two strings.

$diff = $tools->diffMarkup('The quick brown fox', 'The slow brown fox');
// "The <del>quick</del> <ins>slow</ins> brown fox"
OptionDefaultDescription
ins'<ins>{out}</ins>'Markup template for inserted text
del'<del>{out}</del>'Markup template for deleted text
entityEncodetrueEntity-encode the surrounding (unchanged) text
split'\s+'Regex used to split strings into diffable tokens
Tag fixing

fixUnclosedTags($str, $remove, $options)

Remove or close unclosed HTML tags.

$clean = $tools->fixUnclosedTags($html);          // remove all instances of unclosed tags
$fixed = $tools->fixUnclosedTags($html, false);   // close unclosed tags at end of string

When $remove is true (default), all tags of the unclosed type are stripped. When false, closing tags are appended at the end.

Punctuation

getPunctuationChars($sentence)

Return an array of punctuation characters.

$all = $tools->getPunctuationChars();        // [',', ':', '.', '?', '!', ...]
$end = $tools->getPunctuationChars(true);    // sentence-ending only: ['.', '?', '!']
Word alternates

getWordAlternates($word, $options)

Get alternate forms of a word (plurals, stems, synonyms). Returns an empty array unless a module hooks WireTextTools::___wordAlternates() to provide an implementation. This is the integration point for search-enhancement modules.

$alternates = $tools->getWordAlternates('running'); // e.g. ['run', 'runs'] via a hook
Escape characters

findReplaceEscapeChars(&$str, $escapeChars, $options)

Temporarily replace backslash-escaped characters with placeholders so processing steps don't misinterpret literal characters. Restore them by replacing the returned map.

$str = 'Hello \*world\*';
$placeholders = $tools->findReplaceEscapeChars($str, ['*']);
// ... process $str ...
$str = str_replace(array_keys($placeholders), array_values($placeholders), $str);

The returned map is keyed by generated placeholder and valued by the escaped character. The map is per escaped character in $escapeChars, not necessarily per occurrence, so repeated escaped \* characters can share one placeholder.

Useful options:

OptionDefaultDescription
escapePrefix'\\'Escape character prefix
restoreEscapefalseRestore the escape prefix along with the escaped char
gluePrefix'{ESC'Placeholder prefix
glueSuffix'}'Placeholder suffix
unescapeUnknownfalseRemove escape prefix from chars not in $escapeChars
removeUnknownfalseRemove unknown escaped chars entirely
Multibyte-safe string wrappers

WireTextTools includes wrappers for common PHP string functions. They use mbstring when available and fall back to PHP's native string functions otherwise.

$len = $tools->strlen('café');       // 4 when mbstring is available
$part = $tools->substr('café', 2);   // fé
$pos = $tools->strpos('café', 'f');  // 2
$text = $tools->strtolower('HELLO'); // hello

Available wrappers:

  • substr($str, $start, $length = null)
  • strpos($haystack, $needle, $offset = 0)
  • stripos($haystack, $needle, $offset = 0)
  • strrpos($haystack, $needle, $offset = 0)
  • strripos($haystack, $needle, $offset = 0)
  • strlen($str)
  • strtolower($str)
  • strtoupper($str)
  • substrCount($haystack, $needle)
  • strstr($haystack, $needle, $beforeNeedle = false)
  • stristr($haystack, $needle, $beforeNeedle = false)
  • strrchr($haystack, $needle)
  • trim($str, $chars = '')
  • ltrim($str, $chars = '')
  • rtrim($str, $chars = '')
Notes
  • Source file: wire/core/Tools/WireTextTools/WireTextTools.php
  • $sanitizer wraps the most common methods (truncate(), markupToText(), etc.) as direct shortcuts — use those for everyday calls and only reach for getTextTools() when you need methods not on Sanitizer.
  • markupToText() on WireTextTools is newer and more capable than the Sanitizer version; $sanitizer->markupToText() internally delegates to it.
  • truncate() strips HTML by default. Use keepTags or keepFormatTags to preserve formatting, or visible=true to count only visible characters toward the length.
  • populatePlaceholders() with a Page supports dot-notation subfields ({author.name}) and OR-fallback chains ({display_name|title|name}).
  • getWordAlternates() returns an empty array by default — it only produces results when a module implements ___wordAlternates() via hook.
API reference: methods, hooks

Click any linked item for full usage details and examples. Hookable methods are indicated with the icon. In addition to those shown below, the WireTextTools class also inherits all the methods and properties of: Wire.

Show class?     Show args?       Only hookable?    

Common

NameReturnSummary 
WireTextTools::collapse(string $str)
string

Collapse string to plain text that all exists on a single long line without destroying words/punctuation.

 
WireTextTools::diffMarkup(string $old, string $new)
string

Given two strings ($old and $new) return a diff string in HTML markup

 
WireTextTools::findPlaceholders(string $str)
array bool

Find and return all {placeholder} tags found in given string

 
WireTextTools::findReplaceEscapeChars($str, array $escapeChars)
array

Find escaped characters in $str, replace them with a placeholder, and return the placeholders

 
WireTextTools::fixUnclosedTags(string $str)
string

Remove (or close) unclosed HTML tags from given string

 
WireTextTools::getPunctuationChars()
array

Get array of punctuation characters

 
WireTextTools::getVisibleLength(string $str)
int

Return visible length of string, which is length not counting markup or entities

 
WireTextTools::getWordAlternates(string $word)
array

Get alternate words for given word

 
WireTextTools::hasPlaceholders(string $str)
bool

Does the string have any {placeholder} tags in it?

 
WireTextTools::markupToText(string $str)
string

Convert HTML markup to readable text

 
WireTextTools::populatePlaceholders(string $str, $vars)
string

Given a string ($str) and values ($vars), populate placeholder “{tags}” in the string with the values

 
WireTextTools::truncate(string $str, $maxLength)
string

Truncate string to given maximum length without breaking words

 

For hooks

These methods are only useful for hooking and should not be called directly.

PHP function alternates

NameReturnSummary 
WireTextTools::ltrim(string $str)
string

Strip whitespace (or other characters) from the beginning of string only (aka left trim)

 
WireTextTools::rtrim(string $str)
string

Strip whitespace (or other characters) from the end of string only (aka right trim)

 
WireTextTools::stripos(string $haystack, string $needle)
bool false int

Find the position of the first occurrence of a case-insensitive substring in a string

 
WireTextTools::stristr(string $haystack, string $needle)
false string

Find the first occurrence of a string (case insensitive)

 
WireTextTools::strlen(string $str)
int

Get string length

 
WireTextTools::strpos(string $haystack, string $needle)
bool false int

Find position of first occurrence of string in a string

 
WireTextTools::strrchr(string $haystack, string $needle)
false string

Find the last occurrence of a character in a string

 
WireTextTools::strripos(string $haystack, string $needle)
bool false int

Find the position of the last occurrence of a case-insensitive substring in a string

 
WireTextTools::strrpos(string $haystack, string $needle)
bool false int

Find the position of the last occurrence of a substring in a string

 
WireTextTools::strstr(string $haystack, string $needle)
false string

Find the first occurrence of a string

 
WireTextTools::strtolower(string $str)
string

Make a string lowercase

 
WireTextTools::strtoupper(string $str)
string

Make a string uppercase

 
WireTextTools::substr(string $str, int $start)
string

Get part of a string

 
WireTextTools::substrCount(string $haystack, string $needle)
int

Count the number of substring occurrences

 
WireTextTools::trim(string $str)
string

Strip whitespace (or other characters) from the beginning and end of a string

 

Additional methods and properties

In addition to the methods and properties above, WireTextTools also inherits the methods and properties of these classes:

API reference based on ProcessWire core version 3.0.267