Create a feature group

You can create a feature group to register a BigQuery table or view that contains your feature data.

For any BigQuery table or view that you associate with a feature group, you need to ensure the following:

After you create a feature group and associate the BigQuery data source, you can create features to associate with the columns in the data source. It's optional to specify a data source while creating the feature group. However, you need to specify a data source before you create features.

Registering your data source using feature groups and features has the following advantages:

  • You can define a feature view for online serving by using specific feature columns from multiple BigQuery data sources.

  • You can optionally format your data as a time series by specifying a feature timestamp column. Vertex AI Feature Store serves only the latest feature values from the feature data and excludes historical values.

  • You can discover the BigQuery source as the associated feature data source when you search for the feature group resource in Data Catalog.

  • You can set up feature monitoring to retrieve feature statistics and detect feature drift.

Before you begin

Authenticate to Vertex AI, unless you've done so already.

Select the tab for how you plan to use the samples on this page:

Console

When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.

Python

To use the Python samples on this page in a local development environment, install and initialize the gcloud CLI, and then set up Application Default Credentials with your user credentials.

  1. Install the Google Cloud CLI.
  2. To initialize the gcloud CLI, run the following command:

    gcloud init
  3. If you're using a local shell, then create local authentication credentials for your user account:

    gcloud auth application-default login

    You don't need to do this if you're using Cloud Shell.

For more information, see Set up authentication for a local development environment.

REST

To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.

    Install the Google Cloud CLI, then initialize it by running the following command:

    gcloud init

For more information, see Authenticate for using REST in the Google Cloud authentication documentation.

Create a feature group from a BigQuery source

Use the following samples to create a feature group and associate a BigQuery data source.

Console

Use the following instructions to create a feature group using the Google Cloud console.

  1. In the Vertex AI section of the Google Cloud console, go to the Feature Store page.

    Go to the Feature Store page

  2. In the Feature groups section, click Create to open the Basic info pane on the Create Feature Group page.

  3. Specify the Feature group name.

  4. Optional: To add labels, click Add label, and specify the label name and value. You can add multiple labels to a feature group.

  5. In the BigQuery path field, click Browse to select BigQuery source table or view to associate with the feature group.

  6. In the Entity ID column list, select the entity ID columns from the BigQuery source table or view.

    Note that this is optional if the BigQuery source table or view has a column named entity_id. In that case, if you don't select an entity ID column, the feature group uses the entity_id column as the default entity ID column.

  7. Click Continue.

  8. In the Register pane, click one of the following options to indicate whether you want to add features to the new feature group:

    • Include all columns from the BigQuery table—Create features within the feature group for all the columns in the BigQuery source table or view.

    • Manually enter your features—Create features based on specific columns in the BigQuery source. For each feature, enter a Feature name and click the corresponding BigQuery source column name in the list.

      To add more features, click Add another feature.

    • Create an empty feature group—Create the feature group without adding features to it.

  9. Click Create.

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.


from google.cloud import aiplatform
from vertexai.resources.preview import feature_store
from typing import List


def create_feature_group_sample(
    project: str,
    location: str,
    feature_group_id: str,
    bq_table_uri: str,
    entity_id_columns: List[str],
):
    aiplatform.init(project=project, location=location)
    fg = feature_store.FeatureGroup.create(
        name=feature_group_id,
        source=feature_store.utils.FeatureGroupBigQuerySource(
            uri=bq_table_uri, entity_id_columns=entity_id_columns
        ),
    )
    return fg

  • project: Your project ID.
  • location: Region where you want to create the feature group, such as us-central1.
  • feature_group_id: The name of the new feature group that you want to create.
  • bq_table_uri: URI of the BigQuery source table or view that you want to register for the feature group.
  • entity_id_columns: The names of the columns containing the entity IDs. You can specify either one column or multiple columns.
    • To specify only one entity ID column, specify the column name in the following format:
      "entity_id_column_name".
    • To specify multiple entity ID columns, specify the column names in the following format:
      ["entity_id_column_1_name", "entity_id_column_2_name", ...].

REST

To create a FeatureGroup resource, send a POST request by using the featureGroups.create method.

Before using any of the request data, make the following replacements:

  • LOCATION_ID: Region where you want to create the feature group, such as us-central1.
  • ENTITY_ID_COLUMNS: The names of the column(s) containing the entity IDs. You can specify either one column or multiple columns.
    • To specify only one entity ID column, specify the column name in the following format:
      "entity_id_column_name".
    • To specify multiple entity ID columns, specify the column names in the following format:
      ["entity_id_column_1_name", "entity_id_column_2_name", ...].
  • PROJECT_ID: Your project ID.
  • FEATUREGROUP_NAME: The name of the new feature group that you want to create.
  • BIGQUERY_SOURCE_URI: URI of the BigQuery source table or view that you want to register for the feature group.
  • TIMESTAMP_COLUMN: Optional. Specify the name of the column containing the feature timestamps in the BigQuery source table or view.
    You need to specify the timestamp column name only if the data is formatted as a time series and the column containing the feature timestamps isn't named feature_timestamp.
  • STATIC_DATA_SOURCE: Optional. Enter true if the data isn't formatted as a time series. The default setting is false.
  • DENSE: Optional. Indicate how Vertex AI Feature Store handles null values while serving data from feature views associated with the feature group:
    • false—This is the default setting. Vertex AI Feature Store serves only the latest non-null feature values. If the latest value for a feature is null, Vertex AI Feature Store serves the most recent non-null historical value. However, if the current as well as historical values for that feature are null, then Vertex AI Feature Store serves null as the feature value.
    • true—For feature views with scheduled data sync, Vertex AI Feature Store serves only the latest feature values, including null values. For feature views with continuous data sync, Vertex AI Feature Store serves only the latest non-null feature values. However, if the current as well as historical values for the feature are null, then Vertex AI Feature Store serves null as the feature value. For more information about data sync types and how to configure the type of data sync in a feature view, see Sync the data in a feature view.

HTTP method and URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/featureGroups?feature_group_id=FEATUREGROUP_NAME

Request JSON body:

{
  "big_query": {
    "entity_id_columns": "ENTITY_ID_COLUMNS",
    "big_query_source": {
      "input_uri": "BIGQUERY_SOURCE_URI",
    }
    "time_series": {
      "timestamp_column": ""TIMESTAMP_COLUMN"",
    },
    "static_data_source": STATIC_DATA_SOURCE,
    "dense": DENSE
  }
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/featureGroups?feature_group_id=FEATUREGROUP_NAME"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/featureGroups?feature_group_id=FEATUREGROUP_NAME" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/featureGroups/FEATUREGROUP_NAME/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.UpdateFeatureGroupOperationMetadata",
    "genericMetadata": {
      "createTime": "2023-09-18T03:00:13.060636Z",
      "updateTime": "2023-09-18T03:00:13.060636Z"
    }
  }
}

What's next