Datarul - Enterprise Data Governance Platform
Founding engineer of Datarul, a comprehensive enterprise data governance platform serving Fortune 500 clients across financial services, insurance, energy, healthcare, retail, and transportation industries. Built from scratch a microservices-based SaaS platform that consolidates Business Glossary, Data Dictionary, Report Catalog, Data Lineage, and Data Quality management into a unified solution. The platform processes 10M+ data records daily with 99.9% uptime, featuring AI-powered data classification, automated metadata discovery, and interactive data lineage visualization. Architected the entire system using event-driven microservices (.NET Core, Python, Go) with React frontend, implementing RBAC with Active Directory integration, and deploying on both on-premise and cloud infrastructure. Achieved 28% improvement in data quality scores, 35% increase in data analysis efficiency, and 40% reduction in regulatory compliance reporting time for enterprise clients.
Project Details

Key Features
- Business Glossary: AI-powered term standardization with automated synonym detection and corporate knowledge management
- Data Dictionary: Automated metadata discovery across SQL/NoSQL databases with scheduled imports and change tracking
- Report Catalog: Centralized report repository with impact analysis, version control, and cross-environment consistency
- Data Lineage: Interactive graph visualization showing end-to-end data flow from source systems through transformations to reports
- Data Quality: Automated quality rules engine with configurable thresholds, anomaly detection, and trend analysis
- AI-powered data classification automatically tagging sensitive data (PII, PCI, PHI) for compliance
- Natural language search across all metadata using Elasticsearch with relevance ranking
- Role-Based Access Control (RBAC) with Active Directory/LDAP integration and granular permissions
- RESTful and GraphQL APIs for seamless integration with existing enterprise tools
- Scheduled metadata synchronization with configurable import frequencies and conflict resolution
- Historical versioning tracking all metadata changes with audit trails and rollback capabilities
- Multi-tenancy support for enterprise clients with data isolation and customizable branding
Challenges
- Processing and indexing 10M+ database objects across 100+ heterogeneous data sources in real-time
- Building scalable data lineage parser handling complex SQL with CTEs, subqueries, and window functions
- Implementing graph algorithms efficiently for impact analysis across millions of data dependencies
- Designing multi-tenant architecture with strict data isolation while maintaining query performance
- Integrating with 20+ different database technologies (Oracle, SQL Server, PostgreSQL, MongoDB, etc.)
- Ensuring sub-second search response times across billions of metadata records
- Building AI models for data classification handling domain-specific business terminology
- Managing eventual consistency in distributed microservices while maintaining data integrity
- Deploying flexibly on both on-premise air-gapped environments and cloud infrastructure
Solutions
- Architected event-driven microservices using domain-driven design with bounded contexts
- Built custom SQL parser using ANTLR generating abstract syntax trees for lineage extraction
- Implemented graph database (RedisGraph) with optimized traversal algorithms for impact analysis
- Designed multi-tenant PostgreSQL schema with row-level security and tenant-aware queries
- Created adapter pattern framework enabling plug-and-play integration for new data sources
- Deployed Elasticsearch cluster with custom analyzers achieving <200ms search latency
- Trained classification models using TensorFlow with active learning reducing manual labeling by 80%
- Implemented CQRS pattern with event sourcing providing strong consistency guarantees
- Containerized all services with Kubernetes enabling hybrid on-premise/cloud deployments
- Built comprehensive monitoring with Prometheus/Grafana achieving 99.9% SLA compliance
Project Gallery




