This is a fork of hhobin/dataiku_factory in which an attemp to adapt it to the Kiro AI IDE.
A comprehensive Model Context Protocol (MCP) tool suite for Dataiku DSS integration. This project provides AI IDE with direct access to Dataiku DSS for managing recipes, datasets, and scenarios.
- Python 3.11+
- Dataiku DSS instance with API access
- Valid DSS API key
# Clone and setup
git clone <repository-url>
cd dataiku_factory
# Run installation script
./install.sh
# Or install manually:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -e .- Copy environment template:
cp .env.sample .env- Configure your DSS connection in
.env:
DSS_HOST=https://your-dss-instance.com:10000
DSS_API_KEY=your-api-key-here
DSS_INSECURE_TLS=true # Only if using self-signed certificates- Test your connection:
python scripts/mcp_server.py --helpRegister the MCP server:
"dataiku-factory": {
"command": ".venv\\Scripts\\python.exe",
"args": [
"scripts\\mcp_server.py"
],
"env": {
"DSS_HOST": "https://you_dss_host",
"DSS_API_KEY": "you_dss_key",
"DSS_INSECURE_TLS": "false"
},
"disabled": false,
"autoApprove": [
"get_project_flow",
"search_project_objects",
"get_dataset_sample",
"get_dataset_schema",
"get_scenario_logs",
"get_recent_runs",
"get_recipe_code"
]
}
| Tool | Description | Key Parameters |
|---|---|---|
create_recipe |
Create new recipe | project_key, recipe_type, recipe_name, inputs, outputs, code |
update_recipe |
Update existing recipe | project_key, recipe_name, **kwargs |
delete_recipe |
Delete recipe | project_key, recipe_name |
run_recipe |
Execute recipe | project_key, recipe_name, build_mode |
| Tool | Description | Key Parameters |
|---|---|---|
create_dataset |
Create new dataset | project_key, dataset_name, dataset_type, params |
update_dataset |
Update dataset settings | project_key, dataset_name, **kwargs |
delete_dataset |
Delete dataset | project_key, dataset_name, drop_data |
build_dataset |
Build dataset | project_key, dataset_name, mode, partition |
inspect_dataset_schema |
Get dataset schema | project_key, dataset_name |
check_dataset_metrics |
Get dataset metrics | project_key, dataset_name |
| Tool | Description | Key Parameters |
|---|---|---|
create_scenario |
Create new scenario | project_key, scenario_name, scenario_type, definition |
update_scenario |
Update scenario settings | project_key, scenario_id, **kwargs |
delete_scenario |
Delete scenario | project_key, scenario_id |
add_scenario_trigger |
Add trigger to scenario | project_key, scenario_id, trigger_type, **params |
remove_scenario_trigger |
Remove scenario trigger | project_key, scenario_id, trigger_idx |
run_scenario |
Execute scenario | project_key, scenario_id |
| Tool | Description | Key Parameters |
|---|---|---|
get_scenario_logs |
Get detailed run logs and error messages | project_key, scenario_id, run_id |
get_scenario_steps |
Get step configuration including Python code | project_key, scenario_id |
clone_scenario |
Clone scenario with modifications | project_key, source_scenario_id, new_scenario_name, modifications |
| Tool | Description | Key Parameters |
|---|---|---|
get_recipe_code |
Extract actual Python/SQL code from recipes | project_key, recipe_name |
validate_recipe_syntax |
Validate Python/SQL syntax before running | project_key, recipe_name, code |
test_recipe_dry_run |
Test recipe logic without execution | project_key, recipe_name, sample_rows |
| Tool | Description | Key Parameters |
|---|---|---|
get_project_flow |
Get complete data flow/pipeline structure | project_key |
search_project_objects |
Search datasets, recipes, scenarios by pattern | project_key, search_term, object_types |
get_dataset_sample |
Get sample data from datasets | project_key, dataset_name, rows, columns |
| Tool | Description | Key Parameters |
|---|---|---|
get_code_environments |
List available Python/R environments | project_key |
get_project_variables |
Get project-level variables and secrets | project_key |
get_connections |
List available data connections | project_key |
| Tool | Description | Key Parameters |
|---|---|---|
get_recent_runs |
Get recent run history across scenarios/recipes | project_key, limit, status_filter |
get_job_details |
Get detailed job execution information | project_key, job_id |
cancel_running_jobs |
Cancel running jobs/scenarios | project_key, job_ids |
| Tool | Description | Key Parameters |
|---|---|---|
duplicate_project_structure |
Copy project structure to new project | source_project_key, target_project_key, include_data |
export_project_config |
Export project configuration as JSON/YAML | project_key, format |
batch_update_objects |
Update multiple objects with similar changes | project_key, object_type, pattern, updates |
Total: 34 Tools (16 core + 18 advanced)
# Via Claude Code chat:
"""
Create a python recipe called "data_cleaner" that takes "raw_data" as input
and outputs "clean_data" in project "ANALYTICS_PROJECT"
"""
# This translates to:
create_recipe(
project_key="ANALYTICS_PROJECT",
recipe_type="python",
recipe_name="data_cleaner",
inputs=["raw_data"],
outputs=[{"name": "clean_data", "new": True, "connection": "filesystem_managed"}],
code="""
import pandas as pd
df = dataiku.Dataset("raw_data").get_dataframe()
# Add your cleaning logic here
df_clean = df.dropna()
dataiku.Dataset("clean_data").write_with_schema(df_clean)
"""
)# Via Claude Code chat:
"""
Build the dataset "user_analytics" in project "BI" with recursive build mode
"""
# This translates to:
build_dataset(
project_key="BI",
dataset_name="user_analytics",
mode="RECURSIVE_BUILD"
)# Via Claude Code chat:
"""
Add a daily trigger to scenario "daily_etl" that runs at 6:00 AM UTC
"""
# This translates to:
add_scenario_trigger(
project_key="DATA_PIPELINE",
scenario_id="daily_etl",
trigger_type="daily",
hour=6,
minute=0,
timezone="UTC"
)# Via Claude Code chat:
"""
Show me the logs for the latest failed run of scenario "data_processing"
"""
# This translates to:
get_scenario_logs(
project_key="ANALYTICS_PROJECT",
scenario_id="data_processing"
)# Via Claude Code chat:
"""
Extract the code from recipe "customer_segmentation" and validate its syntax
"""
# This translates to:
get_recipe_code(
project_key="ML_PROJECT",
recipe_name="customer_segmentation"
)
validate_recipe_syntax(
project_key="ML_PROJECT",
recipe_name="customer_segmentation"
)# Via Claude Code chat:
"""
Show me the complete data flow for project "SALES_ANALYTICS" and find all datasets containing "customer"
"""
# This translates to:
get_project_flow(
project_key="SALES_ANALYTICS"
)
search_project_objects(
project_key="SALES_ANALYTICS",
search_term="customer",
object_types=["datasets", "recipes", "scenarios"]
)# Via Claude Code chat:
"""
Get a sample of 500 rows from dataset "transactions" showing only customer_id and amount columns
"""
# This translates to:
get_dataset_sample(
project_key="FINANCE_PROJECT",
dataset_name="transactions",
rows=500,
columns=["customer_id", "amount"]
)# Via Claude Code chat:
"""
Show me the recent failed runs in project "DATA_PIPELINE" and get details for any failed jobs
"""
# This translates to:
get_recent_runs(
project_key="DATA_PIPELINE",
limit=20,
status_filter="FAILED"
)
get_job_details(
project_key="DATA_PIPELINE",
job_id="job_12345"
)# Via Claude Code chat:
"""
Export the configuration of project "TEMPLATE_PROJECT" as YAML and duplicate its structure to "NEW_PROJECT"
"""
# This translates to:
export_project_config(
project_key="TEMPLATE_PROJECT",
format="yaml"
)
duplicate_project_structure(
source_project_key="TEMPLATE_PROJECT",
target_project_key="NEW_PROJECT",
include_data=False
)dataiku_factory/
βββ dataiku_mcp/
β βββ __init__.py
β βββ client.py # DSS client wrapper
β βββ server.py # MCP server implementation
β βββ tools/
β βββ recipes.py # Recipe management tools
β βββ datasets.py # Dataset management tools
β βββ scenarios.py # Scenario management tools
β βββ advanced_scenarios.py # Advanced scenario tools
β βββ code_development.py # Code development tools
β βββ project_exploration.py # Project exploration tools
β βββ environment_config.py # Environment configuration
β βββ monitoring_debug.py # Monitoring & debugging
β βββ productivity.py # Productivity tools
βββ scripts/
β βββ mcp_server.py # MCP server entrypoint
βββ install.sh # Installation script
βββ README.md
βββ pyproject.toml
βββ .env.sample
- API Key Protection: Store API keys in environment variables, never in code
- SSL Configuration: Support for self-signed certificates with
DSS_INSECURE_TLS=true - Permission Validation: All operations respect DSS user permissions
- Error Handling: Sensitive information is not exposed in error messages
The MCP server provides logging for monitoring:
# Run with verbose logging
python scripts/mcp_server.py --verbose
# Check logs for debugging
tail -f dataiku_mcp.log- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open a Pull Request
# Install development dependencies
pip install -e .[dev]
# Run code formatting
black dataiku_mcp/ scripts/
ruff check dataiku_mcp/ scripts/- Code recipes:
python,r,sql,pyspark,scala,shell - Visual recipes:
grouping,join,sync,split,distinct,sort,topn
- Managed datasets:
managed(default filesystem storage) - Filesystem datasets:
filesystem(custom paths) - SQL datasets:
sql(database tables) - Cloud datasets:
s3,gcs,azure - Upload datasets:
uploaded(CSV uploads)
- Step-based scenarios:
step_based(visual workflow) - Custom Python scenarios:
custom_python(Python code)
- Periodic:
periodic(every X minutes) - Hourly:
hourly(specific minutes past hour) - Daily:
daily(specific time daily) - Monthly:
monthly(specific day/time monthly) - Dataset:
dataset(on dataset changes)
- Connection refused: Check DSS_HOST and ensure DSS is running
- SSL certificate errors: Set
DSS_INSECURE_TLS=truefor self-signed certificates - API key invalid: Verify API key in DSS admin panel
- Permission denied: Ensure API key has required project permissions
Enable debug logging:
python scripts/mcp_server.py --verboseTest the MCP server connection:
python scripts/mcp_server.py --verboseThis project is licensed under the MIT License - see the LICENSE file for details.
- Built for Dataiku DSS
- Uses Model Context Protocol
- Integrated with Claude Code
Ready to enhance your Dataiku workflows with AI assistance! π