Automation in Feature Engineering with Feature Tools

Harnessing the Power of Feature Tools for Automated Feature Engineering

Ishan | Virginia Tech & IIT Delhi
4 min readMay 12, 2024
https://featuretools.alteryx.com/en/stable/index.html

Feature engineering is one of the creative aspect of machine learning, where the goal is to extract relevant information from raw data to improve model performance.

However, manual feature engineering can have significant overhead in terms of time-taken, often requiring domain expertise and iterative experimentation.

This process becomes even more challenging when dealing with large datasets with numerous features.

To address these challenges, FeatureTools emerges as a powerful solution.

What is Feature Tools Python Library?

Feature Tools is a Python library that automates the process of feature engineering by automatically generating features from raw data.

At its core, Feature Tools employs a technique called “deep feature synthesis” (DFS) to create new features from combinations of existing ones.

The key functionality of Feature Tools revolves around creating and manipulating entities and relationships within an entity set. An entity represents a set of instances with common properties, such as customers or products, while relationships define how entities are related to each other.

Using Feature Tools, data scientists can define entity sets and relationships between them, providing the necessary context for feature generation. Feature Tools then applies DFS to iteratively combine features across entities, creating a rich set of new features that capture complex relationships and interactions within the data.

Deep feature synthesis (DFS) involves recursively stacking aggregation and transformation operations on top of each other to generate new features.

This process explores all possible combinations of features, allowing Feature Tools to uncover intricate patterns and relationships that may not be apparent through manual feature engineering.

Why Use Feature Tools?

  • Saves time by automating the feature engineering process.
  • Reduces the risk of human error in feature creation.
  • Helps uncover complex patterns and relationships in the data.
  • Improves model performance by providing more relevant features.

When to Use Feature Tools?

  • When dealing with high-dimensional datasets with numerous features.
  • When manual feature engineering is time-consuming.
  • When exploring complex relationships between features.

Let’s understand use of Feature Tools with an example (Python Code):

Full code if available in the Github Repo — Link

Step 1: Installing and importing Feature Tools.

!pip install featuretools
import featuretools as ft

Step 2: Loading sample data files.

data = ft.demo.load_mock_customer()
datapyt
customers_df = data["customers"]
sessions_df = data["sessions"]
transactions_df = data["transactions"]

sessions_df.head()

transactions_df.head()

customers_df.head()

Output:

Customers Dataframe
Sessions Dataframe
Transaction Dataframe

Step 3: Defining entity sets and relationships.

dataframes = {
"customers": (customers_df, "customer_id"),
"sessions": (sessions_df, "session_id", "session_start"),
"transactions": (transactions_df, "transaction_id", "transaction_time"),
}

# Relationship and Keys

relationships = [
("sessions", "session_id", "transactions", "session_id"),
("customers", "customer_id", "sessions", "customer_id"),
]

Step 4: Generating features using deep feature synthesis.

feature_matrix_customers, features_defs_cust = ft.dfs(dataframes=dataframes,
relationships=relationships,
target_dataframe_name="customers",
)

feature_matrix_customers

# output is listed below

# We now have dozens of new features to describe a customer’s behavior.
Feature Matrix for Customer DataFrame. Contains 75 Columns including engineered features.
# Session dataframe 

feature_matrix_sessions, features_defs_sess = ft.dfs(
dataframes=dataframes, relationships=relationships,
target_dataframe_name="sessions")

feature_matrix_sessions

# We now have dozens of new features to describe a session's behavior.

# Output

Output:

We now have dozens of new features to describe a session’s behavior.
feature_matrix_transactions, features_defs_txn = ft.dfs(dataframes=dataframes, 
relationships=relationships,
target_dataframe_name="transactions"
)

feature_matrix_transactions.head()

# Output is listed below:

Output:

We now have dozens of new features to describe a Txn’s behavior.

Step 6: Inspecting and selecting relevant features.

feature = features_defs_txn[18]
ft.graph_feature(feature)

Output:

Finally, integrate the engineered features into machine learning models.

Enjoyed this edition? Stay updated with my latest content by subscribing to the newsletter — Leadership Edge.

Dive Deeper into Author’s Content:

  1. Executive Guide to Select Data & Engineering Tools
  2. Age of Super Data Scientists: Mutate with ChatGPT and Alike

--

--