Cobra Forum

Plesk Panel => Web Application => Topic started by: mahesh on Mar 19, 2024, 08:24 AM

Title: How to Use Feast Feature Store with Vultr Managed Database for Redis
Post by: mahesh on Mar 19, 2024, 08:24 AM
Introduction
Feast is an open-source feature store that enables efficient management and serving of machine learning (ML) features for real-time applications. It provides a unified interface for storing, discovering, and accessing features, which are the individual measurable properties or characteristics of the data used for ML modeling. Feast follows a distributed architecture that consists of several components working together. These include the Feast Registry, Stream Processor, Batch Materialization Engine, and Stores.

Feast supports offline and online stores. While an offline store works with historical time-series feature values that are stored in data sources, Feast uses online stores to serve features at low latency. Feature values are loaded from data sources into the online store through materialization, which can be triggered through the materialize command.

One of the supported online stores in Feast is Redis, which is an open-source, in-memory data structure store. This article explains how to use a Vultr Managed Database for Redis as an online feature store for Feast.

Advantages of Redis as an online feature store
High latency can harm model performance and the overall user experience. One of the crucial factors in the success of a feature store is the ability to serve features at low latency. Using Redis as an online feature store attracts several advantages such as:

Prerequisites
To follow the instructions in this article, make sure you:

Using Vultr Managed Database for Redis as an online feature store for Feast
Install Dependecies

To successfully connect to a Vultr Managed Database for Redis and install Feast, you need to set up Python, Redis CLI, and install the Feast SDK as described in this section.

1.Install Python 3.10 on the server.

$ sudo apt-get install python3.10
2.Install the Pip3 Python package manager.

$ sudo apt-get -y install python3-pip
3.Install the Redis CLI tool.

$ sudo apt-get install redis
4.Install the Feast SDK and CLI.

$ pip install feast
5.To use Redis as the online store, install the redis dependency.

$ pip install 'feast[redis]'
Create a feature repository
1.Using Feast, bootstrap a new feature repository.

$ feast init feast_vultr_redis
Output:

Creating a new Feast repository in <full path to your directory>
2.Switch to the newly added directory.

$ cd feast_vultr_redis/feature_repo
3.Using a text editor such as Nano, edit the feast_vultr_redis/feature_repo/feature_store.yaml file.

$ nano feast_vultr_redis/feature_repo/feature_store.yaml
4.Add the following contents to the file. Replace VULTR_REDIS_HOST, VULTR_REDIS_PORT, and VULTR_REDIS_PASSWORD with your actual database details.

 project: feast_vultr_redis
 registry: data/registry.db
 provider: local
 online_store:
   type: redis
   connection_string: "VULTR_REDIS_HOST:VULTR_REDIS_PORT,ssl=true,password=VULTR_REDIS_PASSWORD"
Save and close the file.

Register feature definitions and deploy a feature store
To register feature definitions, run the following command.

$ feast apply
The apply command scans Python files in the current directory (example_repo.py in this case) for feature view/entity definitions, registers the objects, and deploys infrastructure.

When successful, your output should look like the one below.

....
Created entity driver
Created feature view driver_hourly_stats_fresh
Created feature view driver_hourly_stats
Created on demand feature view transformed_conv_rate
Created on demand feature view transformed_conv_rate_fresh
Created feature service driver_activity_v1
Created feature service driver_activity_v3
Created feature service driver_activity_v2
Generate training data
1.Create a new file generate_training_data.py.

$ nano `generate_training_data.py`
2.Add the following code to the file.

from datetime import datetime
 import pandas as pd

 from feast import FeatureStore

 entity_df = pd.DataFrame.from_dict(
     {
         # entity's join key -> entity values
         "driver_id": [1001, 1002, 1003],
         # "event_timestamp" (reserved key) -> timestamps
         "event_timestamp": [
             datetime(2021, 4, 12, 10, 59, 42),
             datetime(2021, 4, 12, 8, 12, 10),
             datetime(2021, 4, 12, 16, 40, 26),
         ],
         # (optional) label name -> label values. Feast does not process these
         "label_driver_reported_satisfaction": [1, 5, 3],
         # values we're using for an on-demand transformation
         "val_to_add": [1, 2, 3],
         "val_to_add_2": [10, 20, 30],
     }
 )

 store = FeatureStore(repo_path=".")

 training_df = store.get_historical_features(
     entity_df=entity_df,
     features=[
         "driver_hourly_stats:conv_rate",
         "driver_hourly_stats:acc_rate",
         "driver_hourly_stats:avg_daily_trips",
         "transformed_conv_rate:conv_rate_plus_val1",
         "transformed_conv_rate:conv_rate_plus_val2",
     ],
 ).to_df()

 print("----- Feature schema -----\n")
 print(training_df.info())

 print()
 print("----- Example features -----\n")
 print(training_df.head())
Save and close the file.

3.Generate training data.

$ python3 generate_training_data.py
Load batch features to your online store
1.Serialize the latest values of features to prepare for serving:

$ CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S") &&\ feast materialize-incremental $CURRENT_TIME
When feature data is stored using Redis as the online store, Feast uses it as a two-level map with the help of Redis Hashes. The first level of the map contains the Feast project name and entity key. The entity key is composed of entity names and values. The second level key (in Redis terminology, this is the "field" in a Redis Hash) contains the feature table name and the feature name, and the Redis Hash value contains the feature value.

2.In a new terminal window, paste your Vultr Managed Database for Redis connection string to establish a connection to the database.

$ redis-cli -u rediss://default:[DATABASE_PASSWORD]@[DATABASE_HOST]:[DATABASE_PORT]
Replace DATABASE_PASSWORD, DATABASE_HOST, and DATABASE_PORT with your actual Vultr Managed Database values.

3.When connected, your shell prompt changes to >. Run the following command to view all stored keys.

keys "*"
Your output should look like the one below:

1) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xed\x03\x00\x00feast_vultr_redis"
 2) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xec\x03\x00\x00feast_vultr_redis"
 3) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xeb\x03\x00\x00feast_vultr_redis"
 4) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xe9\x03\x00\x00feast_vultr_redis"
 5) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xea\x03\x00\x00feast_vultr_redis"
4.Check the Redis data type:

> type "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xed\x03\x00\x00feast_vultr_redis"
Output:

hash
5.Verify the contents of the hash.

 > hgetall "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xed\x03\x00\x00feast_vultr_redis"
Your output should look like the one below.

 1) "_ts:driver_hourly_stats"
 2) "\b\xd0\xa4\xb5\xa5\x06"
 3) "a`\xe3\xda"
 4) "5\xf20Q?"
 5) "\xfa^X\xad"
 6) "5\x83\x7f\xcb>"
Fetch feature vectors for inference
At inference time, you can read the latest feature values for different drivers from the online feature store using get_online_features(). In this section, fetch feature vectors for inference as described below.

1.Create a new fetch_feature_vectors.py file.

$ nano `fetch_feature_vectors.py`
2.Add the following code to the file.

 from pprint import pprint
 from feast import FeatureStore

 store = FeatureStore(repo_path=".")

 feature_vector = store.get_online_features(
     features=[
         "driver_hourly_stats:conv_rate",
         "driver_hourly_stats:acc_rate",
         "driver_hourly_stats:avg_daily_trips",
     ],
     entity_rows=[
         # {join_key: entity_value}
         {"driver_id": 1004},
         {"driver_id": 1005},
     ],
 ).to_dict()

 pprint(feature_vector)
Save and close the file.

3.Fetch feature vectors, run:

$ python3 fetch_feature_vectors.py
Your output should look like the one below.

 {
     'acc_rate': [0.1056235060095787, 0.7656288146972656],
     'avg_daily_trips': [521, 45],
     'conv_rate': [0.24400927126407623, 0.48361605405807495],
     'driver_id': [1004, 1005]
 }
Conclusion
In this article, you used Feast for feature retrieval, and discovered why Redis is a good fit using a Vultr Managed Database for Redis as the online store.