Work in Progress: This page is under development. Use the feedback button on the bottom right to help us improve it.

Text & Regex Functions

Functions for string manipulation and pattern matching.

Case Conversion

FunctionDescriptionExample
LOWER(str)Convert to lowercaseLOWER('Hello')'hello'
UPPER(str)Convert to uppercaseUPPER('Hello')'HELLO'
INITCAP(str)Capitalize first letter of each wordINITCAP('hello world')'Hello World'
SELECT
  LOWER(email) as normalized_email,
  UPPER(country_code) as country,
  INITCAP(full_name) as display_name
FROM users

Trimming & Padding

FunctionDescription
TRIM(str)Remove whitespace from both ends
LTRIM(str)Remove whitespace from left
RTRIM(str)Remove whitespace from right
BTRIM(str, chars)Remove specific characters from both ends
LPAD(str, len, pad)Pad left to length
RPAD(str, len, pad)Pad right to length
SELECT
  TRIM('  hello  ') as trimmed,           -- 'hello'
  LTRIM('00123', '0') as no_leading,      -- '123'
  LPAD('42', 5, '0') as padded,           -- '00042'
  RPAD(sku, 10, '-') as fixed_width
FROM products

Substrings

FunctionDescription
SUBSTR(str, start, len)Extract substring
LEFT(str, n)First n characters
RIGHT(str, n)Last n characters
SPLIT_PART(str, delim, n)Split and get nth part
SELECT
  SUBSTR(phone, 1, 3) as area_code,
  LEFT(description, 100) as preview,
  RIGHT(account_number, 4) as last_four,
  SPLIT_PART(email, '@', 2) as domain
FROM customers

Finding & Replacing

FunctionDescription
STRPOS(str, substr)Position of substring (1-indexed)
REPLACE(str, from, to)Replace all occurrences
TRANSLATE(str, from, to)Character-by-character replacement
OVERLAY(str PLACING new FROM pos FOR len)Replace portion of string
SELECT
  STRPOS(url, '?') as query_start,
  REPLACE(phone, '-', '') as digits_only,
  TRANSLATE(input, 'áéíóú', 'aeiou') as normalized,
  OVERLAY(ssn PLACING '***' FROM 1 FOR 3) as masked
FROM data

Concatenation

FunctionDescription
CONCAT(str1, str2, ...)Concatenate strings
CONCAT_WS(sep, str1, str2, ...)Concatenate with separator
|| operatorConcatenate two strings
REPEAT(str, n)Repeat string n times
SELECT
  CONCAT(first_name, ' ', last_name) as full_name,
  CONCAT_WS(', ', city, state, country) as location,
  '[' || status || ']' as bracketed,
  REPEAT('*', rating) as stars
FROM users

Length Functions

FunctionDescription
LENGTH(str)Character count
CHAR_LENGTH(str)Character count (alias)
OCTET_LENGTH(str)Byte count
BIT_LENGTH(str)Bit count
SELECT
  LENGTH(description) as char_count,
  OCTET_LENGTH(payload) as byte_size
FROM messages
WHERE LENGTH(content) > 0

String Generation

FunctionDescription
CHR(code)Character from ASCII/Unicode code
ASCII(str)ASCII code of first character
UUID()Generate unique identifier
REVERSE(str)Reverse character order
SELECT
  UUID() as id,
  ASCII('A') as code,         -- 65
  CHR(65) as letter,          -- 'A'
  REVERSE('hello') as rev     -- 'olleh'
FROM events

Regular Expressions

REGEXP_LIKE

Test if pattern matches:

SELECT * FROM logs
WHERE REGEXP_LIKE(message, 'error|warning', 'i')  -- Case insensitive

REGEXP_MATCH

Extract matching groups:

SELECT
  REGEXP_MATCH(log_line, 'user=(\w+)')[1] as username
FROM logs

REGEXP_REPLACE

Replace pattern matches:

SELECT
  REGEXP_REPLACE(phone, '[^0-9]', '') as digits_only,
  REGEXP_REPLACE(text, '\s+', ' ') as single_spaces,
  REGEXP_REPLACE(email, '@.*', '@***') as masked_email
FROM contacts

Regex Flags

FlagMeaning
iCase insensitive
mMultiline mode
gGlobal (replace all)
SELECT REGEXP_REPLACE(text, 'foo', 'bar', 'gi')  -- All occurrences, case insensitive

Utility Functions

FunctionDescription
LEVENSHTEIN(str1, str2)Edit distance between strings
TO_HEX(num)Integer to hexadecimal
FIND_IN_SET(str, list)Position in comma-separated list
SELECT
  LEVENSHTEIN(input, 'expected') as distance,
  TO_HEX(color_code) as hex_color
FROM data
WHERE LEVENSHTEIN(search_term, product_name) < 3  -- Fuzzy match

Practical Examples

Clean User Input

Dataset

signups:

signup_idemailphonename
s1JOHN@EXAMPLE.COM555-123-4567john doe
s2JANE@TEST.ORG(555) 987-6543jane SMITH
s3BOB@COMPANY.NET555.456.7890BOB wilson

SQL

SELECT
  signup_id,
  LOWER(TRIM(email)) as email,
  REGEXP_REPLACE(phone, '[^0-9+]', '') as phone,
  INITCAP(TRIM(name)) as name
FROM signups

Result

signup_idemailphonename
s1john@example.com5551234567John Doe
s2jane@test.org5559876543Jane Smith
s3bob@company.net5554567890Bob Wilson

Parse URLs

Dataset

page_views:

view_idurl
v1https://example.com/product/12345?ref=email&utm_source=newsletter
v2https://shop.example.com/category/shoes?page=2
v3https://example.com/product/67890?ref=social

SQL

SELECT
  view_id,
  SPLIT_PART(url, '/', 3) as domain,
  REGEXP_MATCH(url, '/product/(\d+)')[1] as product_id,
  SPLIT_PART(SPLIT_PART(url, '?', 2), '&', 1) as first_param
FROM page_views

Result

view_iddomainproduct_idfirst_param
v1example.com12345ref=email
v2shop.example.comNULLpage=2
v3example.com67890ref=social

Mask Sensitive Data

Dataset

customers:

customer_idemailssncredit_card
c1john.doe@example.com123-45-67894111-1111-1111-1234
c2jane.smith@test.org987-65-43215500-0000-0000-5678

SQL

SELECT
  customer_id,
  CONCAT(LEFT(email, 2), '***@', SPLIT_PART(email, '@', 2)) as masked_email,
  CONCAT('***-**-', RIGHT(ssn, 4)) as masked_ssn,
  OVERLAY(credit_card PLACING '****-****-****-' FROM 1 FOR 15) as masked_card
FROM customers

Result

customer_idmasked_emailmasked_ssnmasked_card
c1jo***@example.com*--6789--****-1234
c2ja***@test.org*--4321--****-5678

Extract Log Fields

Dataset

logs:

log_idlog_line
l12024-01-15 10:30:45 INFO user=john123 action=login ip=192.168.1.1
l22024-01-15 10:31:02 ERROR user=jane456 action=purchase error=payment_failed
l32024-01-15 10:31:15 INFO user=bob789 action=logout ip=10.0.0.5

SQL

SELECT
  log_id,
  REGEXP_MATCH(log_line, 'user=(\w+)')[1] as username,
  REGEXP_MATCH(log_line, 'action=(\w+)')[1] as action,
  CASE
    WHEN REGEXP_LIKE(log_line, 'ERROR', 'i') THEN 'error'
    WHEN REGEXP_LIKE(log_line, 'WARN', 'i') THEN 'warning'
    ELSE 'info'
  END as level
FROM logs

Result

log_idusernameactionlevel
l1john123logininfo
l2jane456purchaseerror
l3bob789logoutinfo