FireCrawl 部署

FireCrawl 部署

将网页信息爬取整理下来

1. 拉取仓库

git clone https://github.com/mendableai/firecrawl.git
cd firecrawl

2. 修改配置

把 ‘apps/api/.env.example’ 这个目录中的.env 文件复制到根目录作为 .env 文件

cp apps/api/.env.example .env
# 修改必要参数
PORT=6204
HOST=0.0.0.0
USE_DB_AUTHENTICATION=false
NUM_WORKERS_PER_QUEUE=8
# 队列管理面板访问密钥,若 Firecrawl 为公开部署,则需要设置此值
BULL_AUTH_KEY=@
TEST_API_KEY=fc-f0ReCraWl

3. 部署

方法 1:自主构建镜像

docker compose build
docker compose up -d

方法 2: 采用封装好的镜像

https://github.com/MitsuhaYuki/firecrawl-docker-image/releases 下载链接中的 zip 压缩包,把解压出来的文件上传到服务器,同时上传前面的 .envdocker-compose.yaml 文件,可以在官方仓库找到

然后执行以下操作载入镜像并部署

docker load -i api-image.tar
docker tag api:v1.15.0 firecrawl/api:v1.15.0
docker load -i playwright-image.tar
docker tag playwright:v1.15.0 firecrawl/playwright:v1.15.0

修改 docker-compose.yaml 的以下内容,主要是应用镜像,并且只保留 x-common-envservices 两个板块

x-common-env: &common-env
  REDIS_URL: ${REDIS_URL:-redis://redis:6379}
  REDIS_RATE_LIMIT_URL: ${REDIS_URL:-redis://redis:6379}
  PLAYWRIGHT_MICROSERVICE_URL: ${PLAYWRIGHT_MICROSERVICE_URL:-http://playwright-service:3000/scrape}
  USE_DB_AUTHENTICATION: ${USE_DB_AUTHENTICATION}
  OPENAI_API_KEY: ${OPENAI_API_KEY}
  OPENAI_BASE_URL: ${OPENAI_BASE_URL}
  MODEL_NAME: ${MODEL_NAME}
  MODEL_EMBEDDING_NAME: ${MODEL_EMBEDDING_NAME}
  OLLAMA_BASE_URL: ${OLLAMA_BASE_URL}
  SLACK_WEBHOOK_URL: ${SLACK_WEBHOOK_URL}
  BULL_AUTH_KEY: ${BULL_AUTH_KEY}
  TEST_API_KEY: ${TEST_API_KEY}
  POSTHOG_API_KEY: ${POSTHOG_API_KEY}
  POSTHOG_HOST: ${POSTHOG_HOST}
  SUPABASE_ANON_TOKEN: ${SUPABASE_ANON_TOKEN}
  SUPABASE_URL: ${SUPABASE_URL}
  SUPABASE_SERVICE_TOKEN: ${SUPABASE_SERVICE_TOKEN}
  SELF_HOSTED_WEBHOOK_URL: ${SELF_HOSTED_WEBHOOK_URL}
  SERPER_API_KEY: ${SERPER_API_KEY}
  SEARCHAPI_API_KEY: ${SEARCHAPI_API_KEY}
  LOGGING_LEVEL: ${LOGGING_LEVEL}
  PROXY_SERVER: ${PROXY_SERVER}
  PROXY_USERNAME: ${PROXY_USERNAME}
  PROXY_PASSWORD: ${PROXY_PASSWORD}
  SEARXNG_ENDPOINT: ${SEARXNG_ENDPOINT}
  SEARXNG_ENGINES: ${SEARXNG_ENGINES}
  SEARXNG_CATEGORIES: ${SEARXNG_CATEGORIES}

services:
  playwright-service:
    # NOTE: If you don't want to build the service locally,
    # uncomment the build: statement and comment out the image: statement
    # image: ghcr.io/mendableai/playwright-service:latest
#    build: apps/playwright-service-ts
    image: firecrawl/playwright:v1.15.0
    environment:
      PORT: 3000
      PROXY_SERVER: ${PROXY_SERVER}
      PROXY_USERNAME: ${PROXY_USERNAME}
      PROXY_PASSWORD: ${PROXY_PASSWORD}
      BLOCK_MEDIA: ${BLOCK_MEDIA}
    networks:
      - backend

  api:
    environment:
      <<: *common-env
      HOST: "0.0.0.0"
      PORT: ${INTERNAL_PORT:-3002}
      FLY_PROCESS_GROUP: app
      ENV: local
    image: firecrawl/api:v1.15.0
    depends_on:
      - redis
      - playwright-service
    ports:
      - "${PORT:-3002}:${INTERNAL_PORT:-3002}"
    command: [ "pnpm", "run", "start:production" ]
    networks:
      - backend
      - default

  worker:
    environment:
      <<: *common-env
      FLY_PROCESS_GROUP: worker
      ENV: local
    depends_on:
      - redis
      - playwright-service
      - api
    command: [ "pnpm", "run", "workers" ]
    image: firecrawl/api:v1.15.0
    networks:
      - backend

  redis:
    # NOTE: If you want to use Valkey (open source) instead of Redis (source available),
    # uncomment the Valkey statement and comment out the Redis statement.
    # Using Valkey with Firecrawl is untested and not guaranteed to work. Use with caution.
    image: redis:alpine
    # image: valkey/valkey:alpine

    networks:
      - backend
    command: redis-server --bind 0.0.0.0

networks:
  backend:
    driver: bridge

启动,访问 http://<Firecrawl-IP>:<port>/admin/@/queues 进入数据抓取队列页面@为BULL_AUTH_KEY 的设置值,删除 <<: *common-service

n8n 使用需要改为 http://<Firecrawl-IP>:<port>/v1,API 为 BULL_AUTH_KEY 的值

docker compose -p firecrawl up -d
评论