Welcome to Platform Documentation¶

Welcome to the internal technical documentation for Grepsr Platform, a leading provider of web scraping as a service. This documentation serves as a comprehensive resource for our engineering and development teams, providing insights into our systems, workflows, and business logic.

About Us¶

At Grepsr, we specialize in delivering high-quality web scraping solutions to our clients. Our services are powered by a robust tech stack, including:

Web Application Platform: A scalable platform for managing web scraping tasks.
APIs: Easy-to-use APIs for clients to access scraped data.
Microservices: A distributed architecture for handling complex workflows.
Data Pipelines: Built using Flink, Fluentd, and Temporal for efficient data processing.
Integrations: Seamless integration with tools like Jira, Auth0, SendGrid, and Algolia.
Databases: MySQL, MongoDB, DynamoDB, and Redis for data storage and retrieval.
AWS Services: Leveraging Kinesis, CloudWatch, and other AWS tools for scalability and monitoring.

Purpose of this Documentations¶

This documentation is designed to:
1. Onboard New Team Members: Provide a clear understanding of our systems and workflows.
2. Document Business Logic: Explain the core logic behind our web scraping and data processing pipelines.
3. Facilitate Collaboration: Serve as a single source of truth for all technical details.
4. Improve Efficiency: Help teams troubleshoot issues and optimize processes.

What You'll Find Here¶

This documentation is organized into the following sections:

1. Overview¶

Introduction: A high-level overview of our systems and services.
Architecture: An explanation of our technical architecture and design principles.

2. Business Logic¶

Core Workflows: Details about our web scraping and data processing pipelines.
Temporal Workflows: How Temporal is used to manage workflows and retries.
Data Transformation: How raw data is processed and delivered to clients.

3. Technical Architecture¶

Microservices: Documentation for each microservice and its role in the system.
Data Pipelines: How Flink and Fluentd are used for data processing.
Databases: Schema designs and usage patterns for MySQL, MongoDB, DynamoDB, and Redis.
AWS Services: How Kinesis, CloudWatch, and other AWS tools are integrated.

4. Integrations¶

Third-Party Tools: Guides for integrating with Jira, Auth0, SendGrid, and Algolia.
Webhooks: How webhooks are implemented for real-time data delivery.

5. Development and Deployment¶

Development Workflow: How to set up the development environment and contribute to the codebase.
CI/CD Pipelines: Our continuous integration and deployment processes.
Deployment Strategies: How we handle deployments and rollbacks.

6. Monitoring and Logging¶

CloudWatch: How we use CloudWatch for monitoring and alerts.
Logging Best Practices: Guidelines for logging in microservices and data pipelines.
Troubleshooting: Common issues and how to resolve them.

7. Security and Compliance¶

Authentication and Authorization: How Auth0 is used for secure access.
Data Privacy: How sensitive data is handled and stored.
Compliance: Information on GDPR, CCPA, and other regulations.

Getting Started¶

To get started, use the navigation menu on the left to explore the documentation. If you're new to the team, we recommend starting with the Overview and Business Logic sections.

Need Help?¶

If you have any questions or need further assistance, please reach out to:
- Engineering Team: product@grepsr.com
- Documentation Maintainers: raj.maharjan@grepsr.com

Thank you for using our internal technical documentation. Let’s build great things together!