GCP Connector (Google Cloud Storage)
The GCP Connector enables Replenit to ingest data directly from Google Cloud Storage (GCS). Data is accessed securely via a service account and processed within Replenit's ingestion layer.
The ingestion layer parses, normalizes, and unifies data for downstream processing.
.png&w=3840&q=75&dpl=dpl_3Wd1gzkC2G8h85oxb8Hjy8QBnv4v)
How It Works
Supported Ingestion Modes
HISTORICALHistorical Ingestion
- One-time full data load
- Used to initialize datasets
ONGOINGOngoing Ingestion
- Scheduled ingestion (e.g. daily, hourly)
- Incremental updates
Requirements
Data Entities
Replenit requires three core datasets:
| Entity | Description |
|---|---|
| Customer | User-level data |
| Order | Transaction data |
| Product | Catalog data |
1Prepare Data
Datasets should represent the full customer lifecycle and be linkable via identifiers.
Data Format
Preferred: Parquet
Replenit recommends Parquet format.
- Efficient for large-scale data
- Schema consistency
- Faster ingestion
Supported Formats
- JSON (including nested payloads)
- CSV
Data Flexibility
Replenit supports any data structure:
- Flat tables or nested JSON
- No strict limit on fields
- Additional attributes are allowed
Replenit performs:
ℹ️Data can be API-aligned or provided as raw batch data. Replenit handles ETL.
Data Structure
gs://your-bucket/customers/
gs://your-bucket/orders/
gs://your-bucket/products/Suggested Dataset Structure
Customer Dataset (Preferred: Parquet)
Example file:
customers_date=YYYYMMDD.parquetRequired Fields
| Field |
|---|
| customer_id |
Recommended Fields
| Field |
|---|
| created_at |
| updated_at |
| country |
| city |
Order Dataset (JSON / CSV / Parquet)
Example column:
transaction_data_json_payloadExample Structure
{
"identifiers": {
"userId": 3253833,
"email": "user@example.com"
},
"transaction": {
"orderId": "12345",
"orderDate": "2025-01-01T10:00:00Z",
"totalAmount": 120,
"currency": "EUR"
},
"products": [
{
"productId": "SKU-001",
"price": 60,
"quantity": 2
}
]
}Required Logical Fields
| Field | Example Path |
|---|---|
| order_id | transaction.orderId |
| customer_id | identifiers.userId |
| order_date | transaction.orderDate |
Recommended Fields
| Field | Example Path |
|---|---|
| total_amount | transaction.totalAmount |
| currency | transaction.currency |
| product_id | products[].productId |
| quantity | products[].quantity |
Product Dataset (JSON / CSV / Parquet)
Example column:
product_data_json_payloadExample Structure
{
"productId": "UMT-U180065",
"taxonomy": ["Bath", "Care"],
"brand": "BrandX",
"price": 49.99,
"currency": "EUR"
}Required Logical Fields
| Field | Example Path |
|---|---|
| product_id | productId |
Recommended Fields
| Field | Example Path |
|---|---|
| category | taxonomy[] |
| brand | brand |
| price | price |
Data Relationships
Key Requirements
customer_idmust be consistent across datasetsproduct_idmust match between orders and products- Timestamp format should be ISO 8601
Recommended Data Scope
- Up to 24 months of historical data
- Full product catalog
- Complete order history
2Create Service Account
- 1
Go to Google Cloud Console → IAM & Admin → Service Accounts
- 2
Create a service account
- 3
Assign role: Storage Object Viewer
- 4
Create JSON key
- 5
Download the key file
3Grant Access
Grant access to your bucket:
- Role: Viewer
- Scope: Bucket
4Add Data Source
You can configure your data sources and provide access under the Health & Data Management section in your Replenit panel.
.png&w=2048&q=75&dpl=dpl_3Wd1gzkC2G8h85oxb8Hjy8QBnv4v)
Configuration Fields
| Field | Description |
|---|---|
| Entity | Customer / Order / Product |
| Entity Directory Address | customers / orders / products |
| Bucket Name | GCS bucket |
| Credential | JSON key file |
ℹ️Repeat the data source configuration for each entity: Customer, Order, and Product.
5Verify Data Sources
After configuration, verify that:
- All sources are Active
- Correct directories are mapped
6Historical Data Load
- 1
Go to Automation Jobs
- 2
Start historical job
- 3
Select data source
- 4
Run
7Ongoing Sync
- Configure daily job
- Enable incremental ingestion
8Monitoring
| Field | Description |
|---|---|
| Status | Completed / Failed |
| TransferredFileCount | Files processed |
| FailedFileCount | Errors |
| Last Run | Timestamp |
.png&w=1920&q=75&dpl=dpl_3Wd1gzkC2G8h85oxb8Hjy8QBnv4v)
Expected Timeline
| Task | Time |
|---|---|
| Data preparation | 4–8 hours |
| Access setup | 2–4 hours |
| Configuration | 2–4 hours |
| Historical ingestion | 4–6 hours |
| Validation | 3–4 hours |
| Total | 13–24 hours |
Common Issues
| Issue | Cause |
|---|---|
| Access denied | Missing IAM roles |
| No data | Incorrect directory |
| Job failure | Schema mismatch |
| Missing data | ID mismatch |
Security
- Service Account JSON authentication
- Read-only access supported
- No modification to source data
Summary
- Direct ingestion from GCS
- Parquet preferred, JSON/CSV supported
- Flexible schema handling
- Replenit performs ETL and normalization
Need help or have questions?
Our team is ready to assist you. Reach out to us at support@replen.it
