Data TransformationText & StringIntermediate

Split URL

Infoveave Data Automation — Text & String

A full URL column becomes six structured columns — protocol, host, port, path, query, fragment — each ready for its own analysis axis without any string parsing code.

URLs in web logs, referral tracking, API response data, and link analysis datasets contain multiple structured pieces of information compressed into a single string. Filtering by host domain, aggregating traffic by URL path, or extracting query parameters all require breaking the URL into its components first. Split URL handles the complete RFC-standard URL decomposition automatically — you specify which component should go into which named output column and Infoveave parses every row. For query parameter key-value extraction beyond the raw query string, chain Split HTTP Query after Split URL.

Input:Tabular dataset with a column containing full URL strings including protocol, host, and optional port, path, query, and fragmentOutput:Tabular dataset with new named columns for each URL component: protocol, host, port, path, query string, and fragment

What Split URL does

Break full URL strings into protocol, host, port, path, query, and fragment columns in Infoveave. Extract web metadata from URL columns for traffic analysis, path-level reporting, and query parameter investigation.

When to use Split URL

  • You have a dataset with a URL column and need to filter, group, or aggregate by the host domain, URL path, or protocol — for example grouping page view events by the website domain or the top-level path
  • You are processing web access logs, referral data, or API call logs where URLs contain all analysis dimensions compressed into a single string
  • You want to separate the query string portion of URLs for further analysis — either keeping it as a raw string or passing it to Split HTTP Query to extract individual key-value parameters
  • You need to identify which protocol, port, or fragment is used across URL records for security analysis, redirect auditing, or canonical URL validation

When to avoid it

  • You need to extract individual key-value parameters from the query string portion of the URL — first use Split URL to isolate the query string column, then use Split HTTP Query to expand the parameters into named columns
  • Your column contains email addresses rather than URLs — use Split Email Address for email-specific parsing
  • The URLs in your column are highly inconsistent or malformed — apply Find Text or Find and Replace to normalize the URL format before parsing

Where it fits in your Infoveave automation

Split URL is one step inside a multi-step Infoveave workflow. Chain it with other activities — no code, no manual hand-offs.

ConnectRead web logs, link tracking data, API call records, or referral data with a URL column
You are hereSplit URLDecompose the URL column into protocol, host, port, path, query, and fragment columns
Split HTTP QueryOptionally pass the Query column into Split HTTP Query to extract individual key-value parameters
Filter or AggregateFilter by host, group by path, count by protocol, or aggregate metrics by URL component
AutomateSchedule the workflow to parse URL columns automatically on every log import or data refresh

Build this workflow visually in Infoveave Data Automation — drag, connect, and schedule with no infrastructure setup.

Infoveave — Workflow Builder
● SavedSchedule: Daily 06:00
Data SourceConnectRead web logs, link tracki…YOU ARE HERESplit URLDecompose the URL column i…Split HTTP QueryOptionally pass the Query …Filter or AggregateFilter by host, group by p…AutomateSchedule the workflow to p…Dashboard

How teams use Split URL

Real scenarios where this transformation saves hours of manual work.

Technology

Analyze Web Application Access Logs by Path and Protocol

A platform engineering team processes web server access logs where each record includes the full URL accessed by the user. Split URL decomposes each URL into protocol, host, port, and path columns. The team can then filter by protocol to audit HTTP-versus-HTTPS usage, group page view counts by top-level path to identify the most accessed service sections, and flag requests to unexpected host values that may indicate misrouting.

Retail

Extract Landing Page Paths from Campaign Tracking URLs

A digital marketing team processes campaign performance data where each impression or click record includes the full landing page URL with UTM parameters. Split URL separates the path into a dedicated column, allowing the team to aggregate campaign performance by landing page path independently of the query string. The raw query column is then passed to Split HTTP Query to extract individual UTM parameter values.

Marketing

Audit Redirect Chains by Protocol and Host Across Tracked Links

A marketing operations team audits a library of tracked links where each link record contains the full destination URL including protocol, host, and path. Split URL extracts each component into named columns. The team identifies URLs still using HTTP instead of HTTPS, links pointing to deprecated hosts, and redirects landing on unexpected path structures — all from column-level filters rather than string pattern matching on the raw URL.

See Split URL in action

Input data (left) is transformed using the configuration below. The output table (right) is ready for dashboards or downstream steps.

URL Column:url
Protocol Column Name:Protocol
Host Column Name:Host
Port Column Name:Port
Path Column Name:Path
Query Column Name:Query
Fragment Column Name:Fragment

Input Data

Employee IDNameurl
E001John Doehttps://company.com:8080/employee?id=E001&name=John+Doe#profile
E002Marie Duponthttp://marketing.com/employee?id=E002&name=Marie+Dupont
E003Carlos Gomezftp://fileserver.com:21/download?id=E003&name=Carlos+Gomez

Output Data

Employee IDNameurlProtocolHostPortPathQueryFragment
E001John Doehttps://company.com:8080/employee?id=E001&name=John+Doe#profilehttpscompany.com8080/employeeid=E001&name=John+Doeprofile
E002Marie Duponthttp://marketing.com/employee?id=E002...httpmarketing.com/employeeid=E002&name=Marie+Dupont
E003Carlos Gomezftp://fileserver.com:21/download?id=E003...ftpfileserver.com21/downloadid=E003&name=Carlos+Gomez

Configuration

Key fields to configure in the Infoveave workflow builder. Full reference available in the documentation.

URL Column

Select the column containing the full URL strings to parse. Only one column can be selected. URLs should include the protocol scheme — http, https, ftp — for all components to be extracted correctly. URLs without a protocol may not produce a valid Protocol component.

Named component columns

Specify the output column name for each URL component: Protocol, Host, Port, Path, Query, and Fragment. You can omit any component you do not need by leaving that column name blank — no output column will be generated for omitted components. Each named column holds exactly one URL component per row, with empty values for rows where that component is not present in the URL.

Query column

The Query output column holds the raw query string — the portion after the question mark — as a single unparsed string like id=E001&name=John+Doe. This column does not split the query into individual key-value pairs. Pass this column into Split HTTP Query in a subsequent step if you need individual query parameters as separate named columns.

Frequently asked questions

Everything you need to know about Split URL in Infoveave.

Also in Text & String — and what runs before & after

Transformations in the same family as Split URL, often chained together in the same Infoveave workflow.

Part of Infoveave Data Automation

80+ transformations. Zero manual steps.

Split URL is one of over 80 transformation activities available inside Infoveave workflows. Chain transformations together — no code, no exports, no waiting for IT.

Ready to see Infoveave in action?

Book a Demo
ISO 27001ISO 27017ISO 27701GDPRHIPAACCPAAICPACSR LogoCapterra Reviews — Infoveave

© 2026 Noesys Software Pvt Ltd

Infoveave® is a product of Noesys

All Rights Reserved