backend
Featured

Datarul - Enterprise Data Governance Platform

Founding engineer of Datarul, a comprehensive enterprise data governance platform serving Fortune 500 clients across financial services, insurance, energy, healthcare, retail, and transportation industries. Built from scratch a microservices-based SaaS platform that consolidates Business Glossary, Data Dictionary, Report Catalog, Data Lineage, and Data Quality management into a unified solution. The platform processes 10M+ data records daily with 99.9% uptime, featuring AI-powered data classification, automated metadata discovery, and interactive data lineage visualization. Architected the entire system using event-driven microservices (.NET Core, Python, Go) with React frontend, implementing RBAC with Active Directory integration, and deploying on both on-premise and cloud infrastructure. Achieved 28% improvement in data quality scores, 35% increase in data analysis efficiency, and 40% reduction in regulatory compliance reporting time for enterprise clients.

Project Details

Role
Founding Engineer & Technical Lead
Timeline
July 2023 - Present
Tech Stack
.NET Core
Python
Go
Ruby
React
TypeScript
PostgreSQL
MongoDB
Elasticsearch
Redis
RabbitMQ
Kafka
Docker
Kubernetes
TensorFlow
Scikit-Learn
GraphQL
gRPC
Active Directory
OAuth 2.0
Datarul - Enterprise Data Governance Platform

Key Features

  • Business Glossary: AI-powered term standardization with automated synonym detection and corporate knowledge management
  • Data Dictionary: Automated metadata discovery across SQL/NoSQL databases with scheduled imports and change tracking
  • Report Catalog: Centralized report repository with impact analysis, version control, and cross-environment consistency
  • Data Lineage: Interactive graph visualization showing end-to-end data flow from source systems through transformations to reports
  • Data Quality: Automated quality rules engine with configurable thresholds, anomaly detection, and trend analysis
  • AI-powered data classification automatically tagging sensitive data (PII, PCI, PHI) for compliance
  • Natural language search across all metadata using Elasticsearch with relevance ranking
  • Role-Based Access Control (RBAC) with Active Directory/LDAP integration and granular permissions
  • RESTful and GraphQL APIs for seamless integration with existing enterprise tools
  • Scheduled metadata synchronization with configurable import frequencies and conflict resolution
  • Historical versioning tracking all metadata changes with audit trails and rollback capabilities
  • Multi-tenancy support for enterprise clients with data isolation and customizable branding

Challenges

  • Processing and indexing 10M+ database objects across 100+ heterogeneous data sources in real-time
  • Building scalable data lineage parser handling complex SQL with CTEs, subqueries, and window functions
  • Implementing graph algorithms efficiently for impact analysis across millions of data dependencies
  • Designing multi-tenant architecture with strict data isolation while maintaining query performance
  • Integrating with 20+ different database technologies (Oracle, SQL Server, PostgreSQL, MongoDB, etc.)
  • Ensuring sub-second search response times across billions of metadata records
  • Building AI models for data classification handling domain-specific business terminology
  • Managing eventual consistency in distributed microservices while maintaining data integrity
  • Deploying flexibly on both on-premise air-gapped environments and cloud infrastructure

Solutions

  • Architected event-driven microservices using domain-driven design with bounded contexts
  • Built custom SQL parser using ANTLR generating abstract syntax trees for lineage extraction
  • Implemented graph database (RedisGraph) with optimized traversal algorithms for impact analysis
  • Designed multi-tenant PostgreSQL schema with row-level security and tenant-aware queries
  • Created adapter pattern framework enabling plug-and-play integration for new data sources
  • Deployed Elasticsearch cluster with custom analyzers achieving <200ms search latency
  • Trained classification models using TensorFlow with active learning reducing manual labeling by 80%
  • Implemented CQRS pattern with event sourcing providing strong consistency guarantees
  • Containerized all services with Kubernetes enabling hybrid on-premise/cloud deployments
  • Built comprehensive monitoring with Prometheus/Grafana achieving 99.9% SLA compliance

Project Gallery

Business Glossary - Creating corporate memory and standardizing business terms
Business Glossary - Creating corporate memory and standardizing business terms
Data Dictionary - Managing all database assets under one roof
Data Dictionary - Managing all database assets under one roof
Report Catalog - Monitoring reporting tools and tracking changes
Report Catalog - Monitoring reporting tools and tracking changes
Data Lineage - Visualizing and analyzing data flow with diagrams
Data Lineage - Visualizing and analyzing data flow with diagrams
Data Quality - Continuously monitoring data accuracy, consistency, and reliability
Data Quality - Continuously monitoring data accuracy, consistency, and reliability