Skip to content

Glossary

This glossary defines key terms, acronyms, and concepts used in our documentation and systems. Use this as a reference to understand the terminology related to our web scraping services, technical architecture, and workflows.


A

  • API (Application Programming Interface): A set of protocols and tools that allow different software systems to communicate with each other.
  • Auth0: A third-party service used for authentication and authorization in our applications.
  • AWS (Amazon Web Services): A cloud computing platform used to host and manage our infrastructure.
  • Amazon Athena: A serverless interactive query service provided by AWS that allows you to analyze data directly from Amazon S3 using standard SQL. It is used in our systems for ad-hoc querying and data analysis.

B

  • Business Logic: The core rules and processes that define how data is processed, transformed, and delivered in our systems.
  • Backend: The server-side part of our applications that handles data processing, storage, and business logic.

C

  • CloudWatch: An AWS service used for monitoring and logging our applications and infrastructure.
  • CI/CD (Continuous Integration/Continuous Deployment): A set of practices and tools for automating the building, testing, and deployment of our software.
  • Cache: A temporary storage layer used to speed up data access (e.g., Redis).

D

  • Data Pipeline: A series of processes that collect, transform, and deliver data from one system to another.
  • DynamoDB: A fully managed NoSQL database service provided by AWS, used for storing and retrieving structured data.
  • Data Scraping: The process of extracting data from websites or other sources automatically.

F

  • Flink: An open-source stream processing framework used for real-time data processing in our pipelines.
  • Fluentd: An open-source data collector used to unify data processing across our systems.
  • FluentBit: An open-source data collector used to unify logging across our systems.

I

  • Integration: The process of connecting different systems or services to work together (e.g., Jira, Auth0, SendGrid).
  • IoT (Internet of Things): A network of physical devices that collect and exchange data (if applicable to your services).

J

  • Jira: A project management tool used for issue tracking, task management, and agile development.

K

  • Kinesis: An AWS service for real-time data streaming and processing.
  • Kong: An open-source API gateway used to manage, secure, and scale APIs. It acts as a middleware layer between clients and backend services.
  • API Gateway: A server that acts as an entry point for API requests, handling tasks like authentication, rate limiting, and routing.
  • Plugins: Extensions in Kong that add functionality such as authentication, logging, and rate limiting.
  • Upstream: A backend service or server that Kong forwards requests to.
  • Route: A configuration in Kong that defines how incoming requests are mapped to upstream services.
  • Consumer: An entity (e.g., a user or application) that interacts with APIs through Kong.

M

  • Microservices: An architectural style where applications are built as a collection of small, independent services.
  • MongoDB: A NoSQL database that stores data in flexible, JSON-like documents.
  • MySQL: A relational database management system used for structured data storage.

P

  • PostgreSQL: An open-source relational database management system (RDBMS) known for its robustness, scalability, and support for advanced SQL features. It is used in our systems for structured data storage and complex queries. This is used for Kong in our platform.

R

  • Redis: An in-memory data structure store used as a database, cache, and message broker.
  • Rate Limiting: A mechanism to control the number of requests a client can make to an API within a specific time frame.

S

  • SendGrid: A third-party email delivery service used for sending transactional and marketing emails.
  • Stream Processing: A method of processing data in real-time as it is generated (e.g., using Flink).

T

  • Temporal: A workflow orchestration engine used to manage and execute complex workflows in our systems.
  • Third-Party Integration: The process of connecting external services (e.g., Algolia, Auth0) to our systems.

W

  • Web Scraping: The process of extracting data from websites automatically.
  • Webhook: A method for real-time communication between systems, where one system sends data to another as events occur.
  • Workflow: A sequence of steps or processes used to accomplish a specific task (e.g., data processing, scraping).

Z

  • Zero Downtime Deployment: A deployment strategy that ensures our services remain available during updates or changes.

How to Use This Glossary

  • If you encounter an unfamiliar term in the documentation, refer to this glossary for its definition.
  • Use the search functionality (if enabled) to quickly find terms.

Contributing to the Glossary

If you notice a term is missing or needs clarification, please contact the documentation team or submit a pull request to update this file.