Text & Regex Functions
Functions for string manipulation and pattern matching.
Case Conversion
| Function | Description | Example |
|---|---|---|
LOWER(str) | Convert to lowercase | LOWER('Hello') → 'hello' |
UPPER(str) | Convert to uppercase | UPPER('Hello') → 'HELLO' |
INITCAP(str) | Capitalize first letter of each word | INITCAP('hello world') → 'Hello World' |
SELECT
LOWER(email) as normalized_email,
UPPER(country_code) as country,
INITCAP(full_name) as display_name
FROM usersTrimming & Padding
| Function | Description |
|---|---|
TRIM(str) | Remove whitespace from both ends |
LTRIM(str) | Remove whitespace from left |
RTRIM(str) | Remove whitespace from right |
BTRIM(str, chars) | Remove specific characters from both ends |
LPAD(str, len, pad) | Pad left to length |
RPAD(str, len, pad) | Pad right to length |
SELECT
TRIM(' hello ') as trimmed, -- 'hello'
LTRIM('00123', '0') as no_leading, -- '123'
LPAD('42', 5, '0') as padded, -- '00042'
RPAD(sku, 10, '-') as fixed_width
FROM productsSubstrings
| Function | Description |
|---|---|
SUBSTR(str, start, len) | Extract substring |
LEFT(str, n) | First n characters |
RIGHT(str, n) | Last n characters |
SPLIT_PART(str, delim, n) | Split and get nth part |
SELECT
SUBSTR(phone, 1, 3) as area_code,
LEFT(description, 100) as preview,
RIGHT(account_number, 4) as last_four,
SPLIT_PART(email, '@', 2) as domain
FROM customersFinding & Replacing
| Function | Description |
|---|---|
STRPOS(str, substr) | Position of substring (1-indexed) |
REPLACE(str, from, to) | Replace all occurrences |
TRANSLATE(str, from, to) | Character-by-character replacement |
OVERLAY(str PLACING new FROM pos FOR len) | Replace portion of string |
SELECT
STRPOS(url, '?') as query_start,
REPLACE(phone, '-', '') as digits_only,
TRANSLATE(input, 'áéíóú', 'aeiou') as normalized,
OVERLAY(ssn PLACING '***' FROM 1 FOR 3) as masked
FROM dataConcatenation
| Function | Description |
|---|---|
CONCAT(str1, str2, ...) | Concatenate strings |
CONCAT_WS(sep, str1, str2, ...) | Concatenate with separator |
|| operator | Concatenate two strings |
REPEAT(str, n) | Repeat string n times |
SELECT
CONCAT(first_name, ' ', last_name) as full_name,
CONCAT_WS(', ', city, state, country) as location,
'[' || status || ']' as bracketed,
REPEAT('*', rating) as stars
FROM usersLength Functions
| Function | Description |
|---|---|
LENGTH(str) | Character count |
CHAR_LENGTH(str) | Character count (alias) |
OCTET_LENGTH(str) | Byte count |
BIT_LENGTH(str) | Bit count |
SELECT
LENGTH(description) as char_count,
OCTET_LENGTH(payload) as byte_size
FROM messages
WHERE LENGTH(content) > 0String Generation
| Function | Description |
|---|---|
CHR(code) | Character from ASCII/Unicode code |
ASCII(str) | ASCII code of first character |
UUID() | Generate unique identifier |
REVERSE(str) | Reverse character order |
SELECT
UUID() as id,
ASCII('A') as code, -- 65
CHR(65) as letter, -- 'A'
REVERSE('hello') as rev -- 'olleh'
FROM eventsRegular Expressions
REGEXP_LIKE
Test if pattern matches:
SELECT * FROM logs
WHERE REGEXP_LIKE(message, 'error|warning', 'i') -- Case insensitiveREGEXP_MATCH
Extract matching groups:
SELECT
REGEXP_MATCH(log_line, 'user=(\w+)')[1] as username
FROM logsREGEXP_REPLACE
Replace pattern matches:
SELECT
REGEXP_REPLACE(phone, '[^0-9]', '') as digits_only,
REGEXP_REPLACE(text, '\s+', ' ') as single_spaces,
REGEXP_REPLACE(email, '@.*', '@***') as masked_email
FROM contactsRegex Flags
| Flag | Meaning |
|---|---|
i | Case insensitive |
m | Multiline mode |
g | Global (replace all) |
SELECT REGEXP_REPLACE(text, 'foo', 'bar', 'gi') -- All occurrences, case insensitiveUtility Functions
| Function | Description |
|---|---|
LEVENSHTEIN(str1, str2) | Edit distance between strings |
TO_HEX(num) | Integer to hexadecimal |
FIND_IN_SET(str, list) | Position in comma-separated list |
SELECT
LEVENSHTEIN(input, 'expected') as distance,
TO_HEX(color_code) as hex_color
FROM data
WHERE LEVENSHTEIN(search_term, product_name) < 3 -- Fuzzy matchPractical Examples
Clean User Input
Dataset
signups:
| signup_id | phone | name | |
|---|---|---|---|
| s1 | JOHN@EXAMPLE.COM | 555-123-4567 | john doe |
| s2 | JANE@TEST.ORG | (555) 987-6543 | jane SMITH |
| s3 | BOB@COMPANY.NET | 555.456.7890 | BOB wilson |
SQL
SELECT
signup_id,
LOWER(TRIM(email)) as email,
REGEXP_REPLACE(phone, '[^0-9+]', '') as phone,
INITCAP(TRIM(name)) as name
FROM signupsResult
| signup_id | phone | name | |
|---|---|---|---|
| s1 | john@example.com | 5551234567 | John Doe |
| s2 | jane@test.org | 5559876543 | Jane Smith |
| s3 | bob@company.net | 5554567890 | Bob Wilson |
Parse URLs
Dataset
page_views:
| view_id | url |
|---|---|
| v1 | https://example.com/product/12345?ref=email&utm_source=newsletter |
| v2 | https://shop.example.com/category/shoes?page=2 |
| v3 | https://example.com/product/67890?ref=social |
SQL
SELECT
view_id,
SPLIT_PART(url, '/', 3) as domain,
REGEXP_MATCH(url, '/product/(\d+)')[1] as product_id,
SPLIT_PART(SPLIT_PART(url, '?', 2), '&', 1) as first_param
FROM page_viewsResult
| view_id | domain | product_id | first_param |
|---|---|---|---|
| v1 | example.com | 12345 | ref=email |
| v2 | shop.example.com | NULL | page=2 |
| v3 | example.com | 67890 | ref=social |
Mask Sensitive Data
Dataset
customers:
| customer_id | ssn | credit_card | |
|---|---|---|---|
| c1 | john.doe@example.com | 123-45-6789 | 4111-1111-1111-1234 |
| c2 | jane.smith@test.org | 987-65-4321 | 5500-0000-0000-5678 |
SQL
SELECT
customer_id,
CONCAT(LEFT(email, 2), '***@', SPLIT_PART(email, '@', 2)) as masked_email,
CONCAT('***-**-', RIGHT(ssn, 4)) as masked_ssn,
OVERLAY(credit_card PLACING '****-****-****-' FROM 1 FOR 15) as masked_card
FROM customersResult
| customer_id | masked_email | masked_ssn | masked_card |
|---|---|---|---|
| c1 | jo***@example.com | *--6789 | --****-1234 |
| c2 | ja***@test.org | *--4321 | --****-5678 |
Extract Log Fields
Dataset
logs:
| log_id | log_line |
|---|---|
| l1 | 2024-01-15 10:30:45 INFO user=john123 action=login ip=192.168.1.1 |
| l2 | 2024-01-15 10:31:02 ERROR user=jane456 action=purchase error=payment_failed |
| l3 | 2024-01-15 10:31:15 INFO user=bob789 action=logout ip=10.0.0.5 |
SQL
SELECT
log_id,
REGEXP_MATCH(log_line, 'user=(\w+)')[1] as username,
REGEXP_MATCH(log_line, 'action=(\w+)')[1] as action,
CASE
WHEN REGEXP_LIKE(log_line, 'ERROR', 'i') THEN 'error'
WHEN REGEXP_LIKE(log_line, 'WARN', 'i') THEN 'warning'
ELSE 'info'
END as level
FROM logsResult
| log_id | username | action | level |
|---|---|---|---|
| l1 | john123 | login | info |
| l2 | jane456 | purchase | error |
| l3 | bob789 | logout | info |