Infoveave Data Automation — Advanced
When no built-in step does exactly what you need — write the logic yourself. The dataset arrives as df. Modify it with pandas. Return df. The pipeline continues.
Built-in transformation steps cover the most common data operations efficiently and without code. But every real-world data pipeline eventually encounters requirements that fall outside standard step coverage — a multi-condition classification that cannot be expressed in available filter steps, a custom weighted rolling calculation, a domain-specific regex pattern applied across multiple columns simultaneously, or an operation that combines three columns with conditional priority logic. Execute Python Script fills this gap by inserting a fully programmable pandas transformation step anywhere in the pipeline. The dataset at that point in the pipeline is passed in as a DataFrame named df, the script modifies it, and the resulting df is passed forward. This gives data engineers the full expressive power of Python and pandas for custom logic without breaking out of the pipeline architecture — all downstream steps continue to receive properly structured data from the script output.
Run arbitrary Python code against the current pipeline dataset in Infoveave. Apply custom transformations, calculations, conditional logic, regex operations, multi-column derivations, or any pandas operation not covered by built-in steps — by writing Python that receives the dataset as a DataFrame (df) and returns the modified DataFrame.
Execute Python Script is one step inside a multi-step Infoveave workflow. Chain it with other activities — no code, no manual hand-offs.
Build this workflow visually in Infoveave Data Automation — drag, connect, and schedule with no infrastructure setup.
Real scenarios where this transformation saves hours of manual work.
A retail analytics team needs to classify each order into a DiscountTier based on a combination of OrderValue, CustomerTier, and SeasonalFlag — a three-way conditional formula that the available classification steps cannot express in one configuration. Execute Python Script implements the custom logic: orders over 500 from Gold customers during seasonal sales receive Tier 1; orders 200-500 from any tier receive Tier 2; others receive Tier 3. The resulting DiscountTier column is used directly in the margin analysis dashboard.
A manufacturing analytics team calculates OEE using a domain-specific weighting formula that differs from standard OEE: Availability is weighted at 50%, Performance at 30%, and Quality at 20%. The built-in numeric column operations do not support weighted multi-column formulas in a single step. Execute Python Script implements the formula as df['OEE'] = df['Availability']*0.5 + df['Performance']*0.3 + df['Quality']*0.2, then classifies the result into bands using df['OEEBand'] = pd.cut(df['OEE'], ...). The two derived columns feed the production dashboard.
A bank's risk team needs a custom CreditRiskScore combining TransactionVelocity, AccountAge, and GeoRiskFlag with conditional priority weighting that varies based on whether the GeoRiskFlag is active. The built-in steps cannot express the priority switching logic. Execute Python Script implements the conditional scoring: high-geo-risk accounts get a boost added to base velocity score, while standard accounts use plain age-adjusted velocity. The resulting CreditRiskScore column is passed to the fraud alert pipeline as the primary risk input.
Input data (left) is transformed using the configuration below. The output table (right) is ready for dashboards or downstream steps.
df['Score'] = df['Score'].astype(float) * 10
df['Category'] = df['Score'].apply(lambda x: 'High' if x > 300 else 'Low')
Input Data
| StudentID | Score | Category |
|---|---|---|
| S001 | 92 | Math |
| S002 | 75 | Science |
| S003 | 63 | Math |
| S004 | 88 | Science |
| S005 | 41 | Math |
Output Data
| StudentID | Score | Category |
|---|---|---|
| S001 | 920 | High |
| S002 | 750 | High |
| S003 | 630 | High |
| S004 | 880 | High |
| S005 | 410 | High |
Key fields to configure in the Infoveave workflow builder. Full reference available in the documentation.
Code
Enter the Python script that transforms the dataset. The DataFrame is available as the variable df at the start of the script — no import of the data is needed. All standard pandas operations are available: column creation, value mapping, apply functions, filtering, aggregation, reshaping, and regex operations. The script must end with the modified df available as df — it is automatically passed as the output to the next pipeline step. Do not print or return df explicitly — just ensure df is the final state of the dataset after your transformations.
Everything you need to know about Execute Python Script in Infoveave.
Transformations in the same family as Execute Python Script, often chained together in the same Infoveave workflow.
Part of Infoveave Data Automation
Execute Python Script is one of over 80 transformation activities available inside Infoveave workflows. Chain transformations together — no code, no exports, no waiting for IT.
Ready to see Infoveave in action?