Full Guide to BOM Cleaning in PHP

Annotation
In modern PHP development, a hidden BOM (Byte Order Mark) in UTF-8 files can cause fatal errors when using the namespace and declare(strict_types=1) constructs, which are required to be the first statements in a script. The appearance of BOM often goes unnoticed, as it is silently inserted by various editors or through automated conversions, leading to hard-to-diagnose failures and increased debugging costs. Solutions include manual BOM removal via editors (Notepad++, VSCode), PHP scripts, command-line tools (sed, awk, iconv), as well as the comprehensive tool clean-bom-senior.sh, which provides recursive cleanup, atomic operations, backup, and integration into CI/CD and pre-commit hooks. The recommended practices of IDE configuration, pre-commit checks, and encoding standardization ensure the stability and predictability of PHP applications.

BOM PHP Error
Оглавление
  1. 1 · Introduction
  2. 2 · Understanding BOM and Character Encodings
  3. 3 · BOM Detection and Diagnosis
  4. 4 · BOM Cleaning Methods
  5. 5 · DevOps Integration Strategies
  6. 6 · BOM Prevention Best Practices
  7. 7 · Practical Recipes and Workflows
  8. 8 · Troubleshooting Common Issues
  9. 9 · Enterprise Deployment Considerations
  10. 10 · Glossary
  11. Sources

1 · Introduction

1.1 Why This Guide Matters

Byte Order Mark (BOM) markers — invisible three-byte sequences EF BB BF — frequently cause fatal PHP errors when used with modern language constructs like namespace declarations and declare(strict_types=1) statements[^bom]. This comprehensive guide provides enterprise-grade solutions for:

  • Detecting and removing BOM from codebases of any scale;
  • Preventing BOM reintroduction through proper tooling and workflows;
  • Automating cleanup processes in IDE environments, CI/CD pipelines, and Git workflows;
  • Maintaining code integrity during cleanup operations.

1.2 What’s New in the 2025 Edition

This updated edition introduces significant enhancements based on real-world enterprise deployments:

  • clean-bom-senior.sh v2.06.4 with revolutionary features:
  • Complete file attribute preservation: owner, group, permissions, timestamps
  • Atomic operations with automatic rollback on failure
  • Fixed statistics reporting using process substitution instead of pipes
  • Global command access via bom alias
  • Enterprise error handling with comprehensive logging
  • Extended DevOps coverage: Docker containers, Kubernetes deployments, CI/CD templates
  • Security considerations: permission handling, backup strategies, audit trails
  • Performance optimization: parallel processing, large file handling, memory management

1.3 Target Audience

  • PHP Developers working with modern namespace-based applications
  • DevOps Engineers implementing automated code quality checks
  • System Administrators managing multi-developer environments
  • Technical Team Leads establishing coding standards and workflows

2 · Understanding BOM and Character Encodings

2.1 What is Byte Order Mark (BOM)

The Byte Order Mark (BOM) is a special Unicode character (U+FEFF) that appears at the beginning of a text file to indicate:

  • Byte order (endianness) for UTF-16 and UTF-32 encodings
  • Encoding type identification for text processors
  • Unicode presence in the file stream

In UTF-8 files, BOM appears as three bytes: EF BB BF (hexadecimal).

2.2 Why BOM Breaks Modern PHP

PHP’s strict parsing requirements mandate that the first statement after the opening <?php tag must be either:

  • A namespace declaration
  • A declare() statement (e.g., declare(strict_types=1))

When BOM is present, PHP treats these three invisible bytes as content that precedes the namespace/declare statement, resulting in:

<?php
// ❌ Invisible BOM bytes here (EF BB BF)
namespace App\Controllers;  // Fatal error: Namespace declaration statement has to be the very first statement

Error example:

Fatal error: Namespace declaration statement has to be the very first statement or after any declare call in the script

2.3 Cross-Platform Line Ending Issues

CRLF vs LF conflicts arise when:

  • Windows editors save files with \r\n (CRLF) line endings
  • Unix/Linux systems expect \n (LF) only
  • Mixed environments create inconsistent file states

Consequences include:

  • Git diff pollution with «whitespace changes»
  • Failed CI/CD builds due to checksum mismatches
  • Inconsistent behavior across development environments
  • Code review complications with phantom changes

2.4 File Encoding Detection

Understanding how to identify encoding issues is crucial for effective cleanup:

Encoding TypeBOM SignatureHex Representation
UTF-8 with BOMEF BB BF239 187 191
UTF-16 LEFF FE255 254
UTF-16 BEFE FF254 255
UTF-32 LEFF FE 00 00255 254 0 0
UTF-32 BE00 00 FE FF0 0 254 255

3 · BOM Detection and Diagnosis

3.1 Command-Line Detection Methods

Essential diagnostic commands for BOM identification:

ToolCommandPurposeOutput Example
hexdumphexdump -C file.php | head -1Visual hex inspection00000000 ef bb bf 3c 3f 70 68 70
odod -An -tx1 -N3 file.phpByte-level analysisef bb bf
filefile -bi file.phpMIME type detectiontext/x-php; charset=utf-8
xxdxxd -l 16 file.phpHex dump with ASCII00000000: efbb bf3c 3f70 6870

3.2 Advanced Detection Techniques

Bulk scanning for BOM across entire projects:

# Find all files containing BOM
find . -type f \( -name "*.php" -o -name "*.js" -o -name "*.css" \) \
  -exec grep -l $'\xEF\xBB\xBF' {} \;

# Statistical analysis of encoding issues
find . -name "*.php" -exec file -bi {} \; | sort | uniq -c

Automated detection script:

#!/bin/bash
# detect-bom.sh - Enterprise BOM detection
for file in $(find . -name "*.php" -type f); do
    if [[ $(hexdump -n 3 -e '3/1 "%02x"' "$file") == "efbbbf" ]]; then
        echo "BOM detected: $file"
    fi
done

3.3 IDE-Based Detection

Visual Studio Code:

  • Status bar shows encoding (look for «UTF-8 with BOM«)
  • Extensions: «Fix UTF-8 BOM», «BOM detector»

PhpStorm/IntelliJ:

  • File → File Properties → File Encoding
  • Settings → Editor → File Encodings → «Transparent native-to-ascii conversion»

Sublime Text:

  • View → Show Console → view.encoding()
  • Packages: «EncodingHelper», «BOM detector»

4 · BOM Cleaning Methods

4.1 Manual Methods

4.1.1 Text Editor Solutions

Notepad++ (Windows):

  1. Open file → Encoding menu
  2. Select «Convert to UTF-8 without BOM«
  3. Save file (Ctrl+S)

Visual Studio Code:

  1. Click encoding indicator in status bar
  2. Select «Save with Encoding»
  3. Choose «UTF-8» (not «UTF-8 with BOM»)

Sublime Text:

  1. File → Save with Encoding
  2. Select «UTF-8» option
  3. Verify in View → Show Console: view.encoding()

4.1.2 When Manual Methods Are Appropriate

  • Small codebases (< 50 files)
  • One-time cleanup operations
  • Learning and understanding BOM issues
  • Precision editing of specific problematic files

4.2 PHP-based Solutions

4.2.1 Simple BOM Removal Script

<?php
/**
 * Basic BOM removal for single files
 * Usage: php remove-bom.php filename.php
 */
function removeBOM($filepath) {
    $bom = "\xEF\xBB\xBF";
    $content = file_get_contents($filepath);

    if (strncmp($content, $bom, 3) === 0) {
        $cleaned = substr($content, 3);
        file_put_contents($filepath, $cleaned);
        echo "BOM removed from: $filepath\n";
        return true;
    }

    echo "No BOM found in: $filepath\n";
    return false;
}

// Command line usage
if ($argc > 1) {
    removeBOM($argv[1]);
}
?>

4.2.2 Enterprise-Grade PHP Solution

<?php
/**
 * Enterprise BOM Cleaner v2.0
 * Features: Backup, logging, batch processing, error handling
 */
class BOMCleaner {
    private $logFile;
    private $backupDir;
    private $processedCount = 0;
    private $cleanedCount = 0;

    public function __construct($logFile = 'bom-cleaner.log', $backupDir = 'backups') {
        $this->logFile = $logFile;
        $this->backupDir = $backupDir;

        if (!is_dir($this->backupDir)) {
            mkdir($this->backupDir, 0755, true);
        }
    }

    public function processDirectory($directory, $extensions = ['php', 'js', 'css', 'html']) {
        $iterator = new RecursiveIteratorIterator(
            new RecursiveDirectoryIterator($directory)
        );

        foreach ($iterator as $file) {
            if ($file->isFile()) {
                $extension = strtolower($file->getExtension());
                if (in_array($extension, $extensions)) {
                    $this->processFile($file->getPathname());
                }
            }
        }

        $this->logMessage("Processing complete. Files processed: {$this->processedCount}, BOM removed: {$this->cleanedCount}");
    }

    private function processFile($filepath) {
        $this->processedCount++;

        if (!is_readable($filepath) || !is_writable($filepath)) {
            $this->logMessage("Permission denied: $filepath", 'ERROR');
            return false;
        }

        $content = file_get_contents($filepath);
        $bom = "\xEF\xBB\xBF";

        if (strncmp($content, $bom, 3) === 0) {
            // Create backup
            $backupPath = $this->backupDir . '/' . basename($filepath) . '.' . time() . '.bak';
            copy($filepath, $backupPath);

            // Remove BOM
            $cleaned = substr($content, 3);

            if (file_put_contents($filepath, $cleaned) !== false) {
                $this->cleanedCount++;
                $this->logMessage("BOM removed: $filepath (backup: $backupPath)");
            } else {
                $this->logMessage("Failed to write: $filepath", 'ERROR');
            }
        }
    }

    private function logMessage($message, $level = 'INFO') {
        $timestamp = date('Y-m-d H:i:s');
        $logEntry = "[$timestamp] [$level] $message\n";
        file_put_contents($this->logFile, $logEntry, FILE_APPEND | LOCK_EX);
        echo $logEntry;
    }
}

// Usage example
$cleaner = new BOMCleaner();
$cleaner->processDirectory('./src', ['php', 'js', 'css']);
?>

4.3 Command Line Utilities

4.3.1 sed-based Solutions

Basic BOM removal:

# Remove BOM from single file
sed -i '1s/^\xEF\xBB\xBF//' file.php

# Process multiple files
find . -name "*.php" -exec sed -i '1s/^\xEF\xBB\xBF//' {} \;

# Create backup copies
find . -name "*.php" -exec sed -i.bak '1s/^\xEF\xBB\xBF//' {} \;

Advanced sed with validation:

#!/bin/bash
# sed-bom-cleaner.sh - Production-ready sed solution
for file in $(find . -name "*.php" -type f); do
    if hexdump -n 3 -e '3/1 "%02x"' "$file" | grep -q "efbbbf"; then
        echo "Cleaning BOM from: $file"
        sed -i.bom-backup '1s/^\xEF\xBB\xBF//' "$file"
        echo "Backup created: ${file}.bom-backup"
    fi
done

4.3.2 awk-based Solutions

# Remove BOM and CRLF in single pass
awk 'BEGIN{RS="\r?\n"; ORS="\n"} NR==1{gsub(/^\xEF\xBB\xBF/, "")} {print}' file.php > file.php.clean
mv file.php.clean file.php

# Batch processing with awk
find . -name "*.php" -exec awk 'BEGIN{RS="\r?\n"; ORS="\n"} NR==1{gsub(/^\xEF\xBB\xBF/, "")} {print}' {} \; > {}.clean \; && mv {}.clean {} \;

4.3.3 iconv and dos2unix Solutions

# Convert encoding and remove BOM
iconv -f utf-8 -t utf-8 -c input.php -o output.php

# Fix line endings and encoding
dos2unix file.php  # Removes CRLF
iconv -f utf-8 -t utf-8 -c file.php -o file.php.clean  # Removes BOM

4.4 clean-bom-senior.sh Enterprise Solution

4.4.1 Installation and Setup

Quick installation (global access):

# Download and install globally
sudo curl -Lo /usr/local/bin/bom https://github.com/paulmann/Clean_BOM_Senior/raw/main/clean-bom-senior.sh
sudo chmod +x /usr/local/bin/bom

# Verify installation
bom --version

Alternative installation methods:

# User-specific installation
mkdir -p ~/.local/bin
curl -Lo ~/.local/bin/bom https://github.com/paulmann/Clean_BOM_Senior/raw/main/clean-bom-senior.sh
chmod +x ~/.local/bin/bom

# Project-specific installation
mkdir -p tools
cd tools
wget https://github.com/paulmann/Clean_BOM_Senior/raw/main/clean-bom-senior.sh
chmod +x clean-bom-senior.sh

Creating system-wide alias:

# Option 1: Symbolic link (recommended)
sudo ln -s /path/to/clean-bom-senior.sh /usr/local/bin/bom

# Option 2: Shell alias (user-specific)
echo 'alias bom="/path/to/clean-bom-senior.sh"' >> ~/.bashrc
source ~/.bashrc

# Option 3: Function wrapper (advanced)
echo 'bom() { /path/to/clean-bom-senior.sh "$@"; }' >> ~/.bashrc

4.4.2 Basic Usage Patterns

CommandPurposeUse Case
bomRecursive cleanupProduction deployment prep
bom --dry-runPreview changesPre-commit validation
bom --verboseDetailed loggingDevelopment/debugging
bom file1.php file2.jsSpecific filesTargeted cleanup

4.4.3 Advanced Features in v2.06.4

Complete File Attribute Preservation:

  • Owner/Group (UID/GID): Original ownership maintained
  • Permissions: File modes (755, 644, etc.) preserved
  • Timestamps: Modification times unchanged for clean files
  • Extended attributes: SELinux labels, ACLs preserved

Atomic Operations with Rollback:

# Automatic backup creation: filename.bak.PID
# Atomic file replacement or complete rollback
# Zero data loss guarantee

Enhanced Statistics and Reporting:

=== PROCESSING SUMMARY ===
Execution time: 3 seconds
Files processed: 247
Files skipped (clean): 203
Errors encountered: 0

--- Issues Fixed ---
BOM signatures removed: 23
CRLF line endings fixed: 21

--- File Type Distribution ---
.php files: 156
.js files: 67
.css files: 24

Process Substitution Fix (Critical v2.06.4 improvement):

  • Previous versions: Statistics lost due to pipe subshells
  • v2.06.4: Correct statistics using while ... < <(find)
  • Impact: Accurate reporting for CI/CD integration

4.4.4 Configuration and Customization

Environment variables:

# Temporary directory override
export TMPDIR="/custom/temp/path"
bom --verbose

# Maximum file size limit (default: 100MB)
# Modify MAX_FILE_SIZE in script for larger files

Supported file extensions (default):

  • .php — PHP scripts
  • .css — Stylesheets
  • .js — JavaScript files
  • .txt — Text documents
  • .xml — XML files
  • .htm/.html — Web pages

5 · DevOps Integration Strategies

5.1 CI/CD Pipeline Integration

5.1.1 GitHub Actions Implementation

Basic workflow (.github/workflows/bom-check.yml):

name: BOM Validation
on: 
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  bom-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install BOM Cleaner
        run: |
          sudo curl -Lo /usr/local/bin/bom https://github.com/paulmann/Clean_BOM_Senior/raw/main/clean-bom-senior.sh
          sudo chmod +x /usr/local/bin/bom

      - name: Check for BOM issues
        run: |
          if ! bom --dry-run --verbose; then
            echo "❌ BOM or CRLF issues detected"
            echo "Run 'bom --verbose' locally to fix"
            exit 1
          fi
          echo "✅ No encoding issues found"

Advanced workflow with auto-fix:

name: BOM Auto-Fix
on:
  push:
    branches: [feature/*]

jobs:
  auto-fix-bom:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          token: ${{ secrets.GITHUB_TOKEN }}

      - name: Clean BOM issues
        run: |
          sudo curl -Lo /usr/local/bin/bom https://github.com/paulmann/Clean_BOM_Senior/raw/main/clean-bom-senior.sh
          sudo chmod +x /usr/local/bin/bom
          bom --verbose

      - name: Commit fixes
        run: |
          git config --local user.email "action@github.com"
          git config --local user.name "GitHub Action"
          git add -A
          git diff --staged --quiet || git commit -m "🧹 Auto-fix: Remove BOM and normalize line endings"
          git push

5.1.2 GitLab CI Configuration

.gitlab-ci.yml example:

stages:
  - validate
  - cleanup

variables:
  BOM_VERSION: "v2.06.4"

bom_check:
  stage: validate
  image: alpine:latest
  before_script:
    - apk add --no-cache curl bash findutils coreutils
    - curl -Lo /usr/local/bin/bom https://github.com/paulmann/Clean_BOM_Senior/raw/main/clean-bom-senior.sh
    - chmod +x /usr/local/bin/bom
  script:
    - bom --dry-run --verbose
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
    - if: '$CI_COMMIT_BRANCH == "main"'

bom_cleanup:
  stage: cleanup
  script:
    - bom --verbose
  artifacts:
    reports:
      junit: bom-report.xml
    paths:
      - "*.bak.*"
    expire_in: 1 week
  only:
    - develop
    - /^release\/.*$/

5.1.3 Jenkins Pipeline Integration

pipeline {
    agent any

    stages {
        stage('Setup') {
            steps {
                sh '''
                    curl -Lo /tmp/bom https://github.com/paulmann/Clean_BOM_Senior/raw/main/clean-bom-senior.sh
                    chmod +x /tmp/bom
                '''
            }
        }

        stage('BOM Validation') {
            steps {
                script {
                    def bomCheck = sh(
                        script: '/tmp/bom --dry-run --verbose',
                        returnStatus: true
                    )

                    if (bomCheck != 0) {
                        error "BOM or encoding issues detected. Please run BOM cleaner locally."
                    }
                }
            }
        }

        stage('Deploy') {
            when { branch 'main' }
            steps {
                sh '/tmp/bom --verbose'  // Clean before deployment
                // Deploy steps here
            }
        }
    }

    post {
        always {
            archiveArtifacts artifacts: '*.bak.*', allowEmptyArchive: true
            cleanWs()
        }
    }
}

5.2 Git Hooks Implementation

5.2.1 Pre-commit Hook

.git/hooks/pre-commit (executable):

#!/bin/bash
# Pre-commit hook: Prevent commits with BOM/CRLF issues

echo "🔍 Checking for BOM and encoding issues..."

# Check if bom command exists
if ! command -v bom &> /dev/null; then
    echo "⚠️  BOM cleaner not installed. Installing..."
    curl -Lo /tmp/bom https://github.com/paulmann/Clean_BOM_Senior/raw/main/clean-bom-senior.sh
    chmod +x /tmp/bom
    BOM_CMD="/tmp/bom"
else
    BOM_CMD="bom"
fi

# Get staged files
STAGED_FILES=$(git diff --cached --name-only --diff-filter=ACM | grep -E '\.(php|js|css|html|xml)$')

if [ -z "$STAGED_FILES" ]; then
    echo "✅ No relevant files to check"
    exit 0
fi

# Check staged files only
ISSUES_FOUND=false
for file in $STAGED_FILES; do
    if [ -f "$file" ]; then
        if ! $BOM_CMD "$file" --dry-run &> /dev/null; then
            echo "❌ Issues found in: $file"
            ISSUES_FOUND=true
        fi
    fi
done

if [ "$ISSUES_FOUND" = true ]; then
    echo ""
    echo "🚨 BOM or CRLF issues detected in staged files!"
    echo "💡 Run the following command to fix:"
    echo "   $BOM_CMD --verbose"
    echo ""
    echo "Then stage and commit your changes again."
    exit 1
fi

echo "✅ No encoding issues detected"
exit 0

5.2.2 Pre-push Hook

.git/hooks/pre-push (executable):

#!/bin/bash
# Pre-push hook: Auto-clean before pushing

protected_branch='main'
current_branch=$(git symbolic-ref HEAD | sed -e 's,.*/\(.*\),\1,')

if [ $protected_branch = $current_branch ]; then
    echo "🧹 Cleaning BOM issues before pushing to main..."

    if command -v bom &> /dev/null; then
        bom --verbose

        # Check if any files were modified
        if [ -n "$(git diff --name-only)" ]; then
            echo "📝 Files were cleaned. Please review and commit:"
            git diff --name-only
            echo ""
            echo "Run: git add . && git commit -m 'Clean BOM and line endings'"
            exit 1
        fi

        echo "✅ Repository is clean"
    else
        echo "⚠️  BOM cleaner not available. Please install clean-bom-senior.sh"
    fi
fi

exit 0

5.2.3 Husky Integration (Node.js projects)

package.json:

{
  "husky": {
    "hooks": {
      "pre-commit": "bom --dry-run && lint-staged",
      "pre-push": "bom --verbose"
    }
  },
  "lint-staged": {
    "*.{php,js,css}": [
      "bom --verbose",
      "git add"
    ]
  }
}

5.3 Docker Integration

5.3.1 Multi-stage Dockerfile

# Multi-stage build with BOM cleaning
FROM alpine:latest AS cleaner
RUN apk add --no-cache curl bash findutils coreutils
RUN curl -Lo /usr/local/bin/bom https://github.com/paulmann/Clean_BOM_Senior/raw/main/clean-bom-senior.sh
RUN chmod +x /usr/local/bin/bom

FROM php:8.2-fpm-alpine
# Copy cleaner tool
COPY --from=cleaner /usr/local/bin/bom /usr/local/bin/bom

# Copy application code
COPY . /var/www/html
WORKDIR /var/www/html

# Clean BOM issues during build
RUN bom --verbose || true

# Remove cleaner tool from final image (optional)
RUN rm -f /usr/local/bin/bom

# Set proper ownership and permissions
RUN chown -R www-data:www-data /var/www/html
RUN find /var/www/html -type d -exec chmod 755 {} \;
RUN find /var/www/html -type f -exec chmod 644 {} \;

EXPOSE 9000
CMD ["php-fpm"]

5.3.2 Docker Compose Integration

docker-compose.yml:

version: '3.8'

services:
  app:
    build: 
      context: .
      dockerfile: Dockerfile
    volumes:
      - ./src:/var/www/html/src
    depends_on:
      - bom-cleaner

  bom-cleaner:
    image: alpine:latest
    volumes:
      - ./src:/workspace
    working_dir: /workspace
    command: sh -c "
      apk add --no-cache curl bash findutils coreutils &&
      curl -Lo /usr/local/bin/bom https://github.com/paulmann/Clean_BOM_Senior/raw/main/clean-bom-senior.sh &&
      chmod +x /usr/local/bin/bom &&
      bom --verbose
    "

5.3.3 Kubernetes Job

apiVersion: batch/v1
kind: Job
metadata:
  name: bom-cleaner
spec:
  template:
    spec:
      containers:
      - name: cleaner
        image: alpine:latest
        command: ["/bin/sh"]
        args:
          - -c
          - |
            apk add --no-cache curl bash findutils coreutils
            curl -Lo /usr/local/bin/bom https://github.com/paulmann/Clean_BOM_Senior/raw/main/clean-bom-senior.sh
            chmod +x /usr/local/bin/bom
            bom --verbose /workspace
        volumeMounts:
        - name: source-code
          mountPath: /workspace
      volumes:
      - name: source-code
        persistentVolumeClaim:
          claimName: source-code-pvc
      restartPolicy: Never
  backoffLimit: 3

6 · BOM Prevention Best Practices

6.1 IDE Configuration Standards

6.1.1 Visual Studio Code

Settings configuration (.vscode/settings.json):

{
  "files.encoding": "utf8",
  "files.autoGuessEncoding": false,
  "files.insertFinalNewline": true,
  "files.trimFinalNewlines": true,
  "files.trimTrailingWhitespace": true,
  "files.eol": "\n",
  "editor.renderWhitespace": "boundary"
}

Recommended extensions:

  • EncodingHelper: Visual encoding indicators
  • BOM Detector: Automatic BOM detection and removal
  • EditorConfig: Consistent encoding across team members

6.1.2 PhpStorm/IntelliJ Configuration

File encoding settings:

  1. Settings → Editor → File Encodings
  2. Set «Global Encoding» to UTF-8
  3. Set «Project Encoding» to UTF-8
  4. Disable «Transparent native-to-ascii conversion»
  5. Disable «BOM for UTF-8 files»

Code style configuration:

<!-- .idea/encodings.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
  <component name="Encoding" defaultCharsetForPropertiesFiles="UTF-8">
    <file url="PROJECT" charset="UTF-8" />
  </component>
</project>

6.1.3 Sublime Text Configuration

User settings (Preferences → Settings):

{
  "default_encoding": "UTF-8",
  "fallback_encoding": "UTF-8",
  "show_encoding": true,
  "default_line_ending": "unix"
}

6.2 EditorConfig Implementation

.editorconfig file (project root):

# EditorConfig: https://editorconfig.org/

root = true

[*]
indent_style = space
indent_size = 4
end_of_line = lf
charset = utf-8
trim_trailing_whitespace = true
insert_final_newline = true

[*.{js,css,json,yml,yaml}]
indent_size = 2

[*.md]
trim_trailing_whitespace = false

[*.{php}]
indent_size = 4
max_line_length = 120

6.3 Git Configuration

6.3.1 Repository-wide Settings

.gitattributes file:

# Handle line endings automatically for files detected as text
* text=auto

# Explicitly set text files
*.php text eol=lf
*.js text eol=lf
*.css text eol=lf
*.html text eol=lf
*.xml text eol=lf
*.json text eol=lf
*.md text eol=lf
*.yml text eol=lf
*.yaml text eol=lf

# Explicitly set binary files
*.png binary
*.jpg binary
*.gif binary
*.ico binary
*.pdf binary

6.3.2 Global Git Configuration

# Set line ending handling globally
git config --global core.autocrlf false  # Unix/Linux/macOS
git config --global core.autocrlf true   # Windows (if needed)
git config --global core.eol lf

# Enable whitespace detection
git config --global core.whitespace trailing-space,space-before-tab

6.4 Team Development Standards

6.4.1 CONTRIBUTING.md Template

# Contribution Guidelines

## Code Standards

### File Encoding
- All source files MUST use UTF-8 encoding **without BOM**
- Line endings MUST be Unix-style (LF, not CRLF)
- Files MUST end with a single newline character

### Pre-commit Checklist
1. Run `bom --dry-run` to check for encoding issues
2. Ensure your IDE is configured for UTF-8 without BOM
3. Verify line endings are consistent (LF only)

### Automated Checks
Our CI/CD pipeline automatically checks for:
- BOM markers in source files
- Inconsistent line endings
- Trailing whitespace

### Quick Fix
If you encounter BOM issues:

Install cleaner tool

#bash
sudo curl -Lo /usr/local/bin/bom https://github.com/paulmann/Clean_BOM_Senior/raw/main/clean-bom-senior.sh
sudo chmod +x /usr/local/bin/bom

Clean your code

#bash
bom --verbose
### IDE Setup
Please configure your IDE according to our [IDE Configuration Guide](docs/IDE-SETUP.md).

6.4.2 Onboarding Checklist

# Developer Onboarding - Encoding Setup

## ✅ Required Setup Steps

1. **Install BOM cleaner tool**
#bash
sudo curl -Lo /usr/local/bin/bom https://github.com/paulmann/Clean_BOM_Senior/raw/main/clean-bom-senior.sh
sudo chmod +x /usr/local/bin/bom
2. **Configure Git settings**
#bash
git config --global core.autocrlf false
git config --global core.eol lf
3. **IDE Configuration** (choose your primary IDE)
   - [ ] VS Code: Install EditorConfig extension + apply team settings
   - [ ] PhpStorm: Configure File Encodings (UTF-8, no BOM)
   - [ ] Sublime Text: Set default encoding to UTF-8

4. **Verify setup**
#bash
# This should show no issues
bom --dry-run
## 🔧 Troubleshooting

**Problem**: Git shows file changes but no actual content changes
- **Solution**: Line ending mismatch - run `bom --verbose` to normalize

**Problem**: PHP namespace errors in new files
- **Solution**: BOM in file - your IDE is set to UTF-8 with BOM

**Problem**: Build failures in CI/CD
- **Solution**: Encoding issues - run `bom --dry-run` locally first

7 · Practical Recipes and Workflows

7.1 Language-Specific Workflows

7.1.1 PHP Project Cleanup

Complete PHP project sanitization:

#!/bin/bash
# php-project-clean.sh - Comprehensive PHP cleanup

echo "🧹 Starting PHP project cleanup..."

# 1. Clean BOM and line endings
echo "Step 1: Cleaning BOM and line endings..."
bom --verbose

# 2. Fix PHP-specific issues
echo "Step 2: PHP-specific fixes..."
find . -name "*.php" -type f | while read -r file; do
    # Remove trailing PHP closing tags (PSR-2 compliance)
    if tail -1 "$file" | grep -q "?>"; then
        sed -i '$ { /^?>$/ d }' "$file"
        echo "Removed closing tag: $file"
    fi

    # Ensure single newline at EOF
    if [ -s "$file" ]; then
        if [ "$(tail -c1 "$file")" != "" ]; then
            echo "" >> "$file"
            echo "Added EOF newline: $file"
        fi
    fi
done

# 3. Composer file cleanup
if [ -f "composer.json" ]; then
    echo "Step 3: Cleaning composer.json..."
    # Remove BOM from composer files
    bom composer.json composer.lock 2>/dev/null || true
fi

echo "✅ PHP project cleanup complete!"

7.1.2 JavaScript/Node.js Cleanup

#!/bin/bash
# js-project-clean.sh - JavaScript project cleanup

echo "🧹 JavaScript project cleanup..."

# Clean source files
bom --verbose

# Clean package.json files
find . -name "package*.json" -exec bom {} \;

# Clean configuration files
for config in .eslintrc .prettierrc tsconfig.json webpack.config.js; do
    [ -f "$config" ] && bom "$config"
done

# Node.js specific: clean JavaScript and TypeScript files
find . -name "*.js" -o -name "*.ts" -o -name "*.jsx" -o -name "*.tsx" | while read -r file; do
    # Skip node_modules
    if [[ "$file" == *"node_modules"* ]]; then
        continue
    fi

    echo "Processing: $file"
    bom "$file"
done

echo "✅ JavaScript cleanup complete!"

7.2 Automated Maintenance Scripts

7.2.1 Weekly Maintenance Cron Job

/etc/cron.d/bom-cleanup:

# Weekly BOM cleanup for all project directories
# Runs every Sunday at 3 AM

SHELL=/bin/bash
PATH=/usr/local/bin:/usr/bin:/bin
MAILTO=devops@company.com

# Cleanup main projects
0 3 * * 0 root /usr/local/bin/bom /var/www/project1 --verbose >> /var/log/bom-cleanup.log 2>&1
5 3 * * 0 root /usr/local/bin/bom /var/www/project2 --verbose >> /var/log/bom-cleanup.log 2>&1

# Cleanup user directories (if needed)
10 3 * * 0 root find /home -name "*.php" -path "*/public_html/*" -exec /usr/local/bin/bom {} \; >> /var/log/bom-cleanup.log 2>&1

7.2.2 Monitoring and Alerting Script

#!/bin/bash
# bom-monitor.sh - Monitor for BOM issues and alert

LOG_FILE="/var/log/bom-monitor.log"
ALERT_EMAIL="devops@company.com"
PROJECTS_DIR="/var/www"

# Function to send alerts
send_alert() {
    local subject="$1"
    local message="$2"

    echo "$(date): $message" >> "$LOG_FILE"

    # Send email alert (requires mail command)
    if command -v mail &> /dev/null; then
        echo "$message" | mail -s "$subject" "$ALERT_EMAIL"
    fi

    # Send Slack notification (if webhook configured)
    if [ -n "$SLACK_WEBHOOK" ]; then
        curl -X POST -H 'Content-type: application/json' \
            --data "{\"text\":\"🚨 $subject\n$message\"}" \
            "$SLACK_WEBHOOK"
    fi
}

# Scan for BOM issues
echo "$(date): Starting BOM monitoring scan..." >> "$LOG_FILE"

ISSUES_FOUND=false
for project in "$PROJECTS_DIR"/*; do
    if [ -d "$project" ]; then
        project_name=$(basename "$project")

        # Check for BOM issues
        if ! bom --dry-run "$project" &> /dev/null; then
            ISSUES_FOUND=true
            send_alert "BOM Issues Detected in $project_name" \
                "BOM or encoding issues found in project: $project_name\nPath: $project\nPlease run 'bom --verbose $project' to fix."
        fi
    fi
done

if [ "$ISSUES_FOUND" = false ]; then
    echo "$(date): No BOM issues detected across all projects" >> "$LOG_FILE"
fi

7.3 Integration with Build Tools

7.3.1 Makefile Integration

# Makefile with BOM cleaning targets

.PHONY: clean-bom check-bom install-bom help

# Default target
help:
    @echo "Available targets:"
    @echo "  install-bom  - Install BOM cleaner tool"
    @echo "  check-bom    - Check for BOM/encoding issues"
    @echo "  clean-bom    - Clean BOM/encoding issues"
    @echo "  test-clean   - Run tests after cleaning"

# Install BOM cleaner
install-bom:
    @if ! command -v bom &> /dev/null; then \
        echo "Installing BOM cleaner..."; \
        sudo curl -Lo /usr/local/bin/bom https://github.com/paulmann/Clean_BOM_Senior/raw/main/clean-bom-senior.sh; \
        sudo chmod +x /usr/local/bin/bom; \
        echo "✅ BOM cleaner installed"; \
    else \
        echo "✅ BOM cleaner already installed"; \
    fi

# Check for issues without fixing
check-bom: install-bom
    @echo "🔍 Checking for BOM/encoding issues..."
    @bom --dry-run --verbose

# Clean BOM issues
clean-bom: install-bom
    @echo "🧹 Cleaning BOM/encoding issues..."
    @bom --verbose

# Run tests after cleaning
test-clean: clean-bom
    @echo "🧪 Running tests after cleanup..."
    @composer test

# Pre-commit hook simulation
pre-commit: check-bom
    @if ! bom --dry-run &> /dev/null; then \
        echo "❌ BOM issues detected. Run 'make clean-bom' first."; \
        exit 1; \
    fi
    @echo "✅ No BOM issues detected"

# Build target with cleaning
build: clean-bom
    @echo "🏗️ Building project..."
    @composer install --no-dev --optimize-autoloader
    @npm run build

# Deploy target
deploy: build test-clean
    @echo "🚀 Deploying..."
    # Add deployment commands here

7.3.2 npm Scripts Integration

package.json:

{
  "scripts": {
    "prebuild": "bom --verbose",
    "build": "webpack --mode production",
    "pretest": "bom --dry-run",
    "test": "jest",
    "clean:bom": "bom --verbose",
    "check:bom": "bom --dry-run",
    "lint": "eslint . && bom --dry-run",
    "prepare": "husky install",
    "postinstall": "bom --dry-run || echo 'Warning: BOM issues detected'"
  },
  "husky": {
    "hooks": {
      "pre-commit": "npm run check:bom && lint-staged"
    }
  }
}

7.3.3 Composer Scripts (PHP)

composer.json:

{
  "scripts": {
    "clean-bom": "bom --verbose",
    "check-bom": "bom --dry-run",
    "pre-autoload-dump": "@check-bom",
    "post-install-cmd": [
      "@check-bom"
    ],
    "post-update-cmd": [
      "@clean-bom"
    ],
    "test": [
      "@check-bom",
      "phpunit"
    ]
  },
  "scripts-descriptions": {
    "clean-bom": "Clean BOM and line ending issues",
    "check-bom": "Check for BOM and encoding issues without fixing"
  }
}

8 · Troubleshooting Common Issues

8.1 Permission and Access Problems

8.1.1 Permission Denied Errors

Problem: Cannot write to file: protected.php

Diagnosis:

# Check file permissions
ls -la protected.php

# Check directory permissions
ls -la $(dirname protected.php)

# Check file ownership
stat protected.php

Solutions:

# Solution 1: Fix file permissions
chmod 644 protected.php

# Solution 2: Fix ownership (if you're the owner)
chown $USER:$GROUP protected.php

# Solution 3: Run with sudo (use carefully)
sudo bom --verbose

# Solution 4: Fix directory permissions
chmod 755 $(dirname protected.php)

8.1.2 SELinux and ACL Issues

Problem: Permission denied despite correct ownership/permissions

Diagnosis:

# Check SELinux context
ls -Z file.php

# Check ACLs
getfacl file.php

# Check SELinux denials
ausearch -m AVC -ts recent

Solutions:

# Fix SELinux context
restorecon -v file.php

# Or set appropriate context
chcon -t httpd_exec_t file.php

# Fix ACLs if needed
setfacl -m u:$USER:rw file.php

8.2 Large File Handling

8.2.1 File Size Limitations

Problem: Files larger than 100MB are skipped

Diagnosis:

# Find large files
find . -name "*.php" -size +100M -exec ls -lh {} \;

# Check memory usage during processing
top -p $(pgrep clean-bom-senior)

Solutions:

# Solution 1: Process large files individually
bom large-file.php

# Solution 2: Modify MAX_FILE_SIZE in script
# Edit clean-bom-senior.sh:
# MAX_FILE_SIZE=$((500 * 1024 * 1024))  # 500MB

# Solution 3: Split large files if possible
split -l 10000 huge-file.php huge-file-part-

# Solution 4: Use streaming approach for very large files
sed '1s/^\xEF\xBB\xBF//' huge-file.php > huge-file-clean.php

8.2.2 Memory Management

Problem: Out of memory errors on large files

Solution with streaming:

#!/bin/bash
# stream-bom-cleaner.sh - Memory-efficient BOM removal

process_large_file() {
    local file="$1"
    local temp=$(mktemp)

    # Check if file starts with BOM
    if od -An -tx1 -N3 "$file" | tr -d ' ' | grep -q '^efbbbf$'; then
        echo "Processing large file: $file"

        # Remove BOM using dd (memory efficient)
        dd if="$file" of="$temp" bs=1 skip=3 2>/dev/null

        # Replace original file
        mv "$temp" "$file"
        echo "BOM removed from: $file"
    else
        echo "No BOM in: $file"
        rm -f "$temp"
    fi
}

# Usage
process_large_file "$1"

8.3 Character Encoding Issues

8.3.1 Mixed Encoding Detection

Problem: Files with mixed encodings causing corruption

Diagnosis:

# Detect file encoding
file -bi file.php

# Check for non-UTF-8 characters
iconv -f utf-8 -t utf-8 file.php > /dev/null || echo "Encoding issues detected"

# Visual inspection with cat -v
cat -v file.php | head -20

Solutions:

# Solution 1: Convert to UTF-8
iconv -f iso-8859-1 -t utf-8 file.php -o file.php.utf8
mv file.php.utf8 file.php

# Solution 2: Auto-detect and convert
chardet file.php  # Install with: pip install chardet
# Then convert based on detected encoding

# Solution 3: Force UTF-8 conversion (lossy)
iconv -f utf-8 -t utf-8 -c file.php -o file.php.clean

8.3.2 Invisible Character Problems

Problem: Files appear clean but still cause issues

Diagnosis:

# Show all bytes including invisible ones
hexdump -C file.php | head -5

# Check for various Unicode BOMs
od -An -tx1 -N10 file.php

# Look for other invisible characters
cat -A file.php | head -10

Solutions:

# Remove all invisible characters (aggressive)
tr -cd '\11\12\15\40-\176' < file.php > file.php.clean

# Remove specific problematic characters
sed 's/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]//g' file.php > file.php.clean

# Use bom tool with verbose output to see what's found
bom --verbose file.php

8.4 Git Integration Issues

8.4.1 Git Hook Failures

Problem: Pre-commit hooks failing silently

Diagnosis:

# Test hook manually
.git/hooks/pre-commit

# Check hook permissions
ls -la .git/hooks/pre-commit

# Debug with set -x
sed -i '1a set -x' .git/hooks/pre-commit

Solutions:

# Fix permissions
chmod +x .git/hooks/pre-commit

# Add error handling to hook
#!/bin/bash
set -euo pipefail  # Exit on errors

# Add logging
exec > >(tee -a /tmp/pre-commit.log) 2>&1

8.4.2 Line Ending Confusion

Problem: Git reports changes but files look identical

Diagnosis:

# Check line endings
file file.php

# Show line endings visually
cat -e file.php

# Check Git's line ending handling
git config --get core.autocrlf
git config --get core.eol

Solutions:

# Normalize line endings
git add --renormalize .

# Fix Git configuration
git config core.autocrlf false
git config core.eol lf

# Update .gitattributes
echo "* text eol=lf" > .gitattributes

8.5 Performance Optimization

8.5.1 Slow Processing on Large Codebases

Problem: BOM cleaning takes too long on large projects

Optimization strategies:

# Strategy 1: Parallel processing
find . -name "*.php" -type f -print0 | xargs -0 -P 4 -I {} bom {}

# Strategy 2: Skip clean files faster
#!/bin/bash
# fast-bom-check.sh - Skip obviously clean files
for file in $(find . -name "*.php" -type f); do
    # Quick check - only process if BOM detected
    if od -An -tx1 -N3 "$file" 2>/dev/null | tr -d ' ' | grep -q '^efbbbf$'; then
        bom "$file"
    fi
done

# Strategy 3: Process only recently modified files
find . -name "*.php" -mtime -7 -exec bom {} \;

# Strategy 4: Use exclude patterns
bom --verbose --exclude="vendor/*" --exclude="node_modules/*"

8.5.2 I/O Optimization

Problem: Disk I/O bottlenecks

Solutions:

# Use RAM disk for temporary files
sudo mount -t tmpfs -o size=1G tmpfs /tmp/bom-work
export TMPDIR=/tmp/bom-work

# Process files in batches
find . -name "*.php" -print0 | xargs -0 -n 50 bom

# Use SSD for temporary operations
export TMPDIR=/fast/ssd/temp

9 · Enterprise Deployment Considerations

9.1 Security and Compliance

9.1.1 Security Best Practices

Principle of Least Privilege:

# Create dedicated user for BOM operations
sudo useradd -r -s /bin/false bom-cleaner

# Set up sudo rules for limited access
echo "devops ALL=(bom-cleaner) NOPASSWD: /usr/local/bin/bom" >> /etc/sudoers.d/bom-cleaner

# Use in scripts
sudo -u bom-cleaner bom --verbose /var/www/project

Audit Trail:

# Enable detailed logging
export BOM_LOG_LEVEL=DEBUG
export BOM_LOG_FILE="/var/log/bom-operations.log"

# Log all operations with user context
logger -t bom-cleaner "User $USER executed BOM cleanup on $(pwd)"

# Integrate with centralized logging (syslog)
bom --verbose 2>&1 | logger -t bom-cleaner

9.1.2 Compliance Requirements

File Integrity Monitoring:

#!/bin/bash
# fim-bom-integration.sh - AIDE/Tripwire integration

# Create checksums before cleaning
find /var/www -name "*.php" -exec sha256sum {} \; > /tmp/checksums-before

# Perform BOM cleaning
bom --verbose /var/www

# Create checksums after cleaning
find /var/www -name "*.php" -exec sha256sum {} \; > /tmp/checksums-after

# Report changes
diff /tmp/checksums-before /tmp/checksums-after > /var/log/bom-changes.log

# Update FIM database
aide --update

SOX/PCI Compliance:

# Segregation of duties - require approval for production
if [[ "$ENVIRONMENT" == "production" ]]; then
    echo "Production BOM cleaning requires approval ticket"
    read -p "Enter approval ticket number: " ticket
    logger -t bom-cleaner "Production cleanup authorized by ticket: $ticket"
fi

# Change management integration
curl -X POST "$CHANGE_MGMT_API/bom-cleanup" \
    -H "Authorization: Bearer $TOKEN" \
    -d "{\"environment\": \"$ENVIRONMENT\", \"files_processed\": $COUNT}"

9.2 Monitoring and Alerting

9.2.1 Health Checks and Monitoring

Nagios/Icinga Plugin:

#!/bin/bash
# check_bom_issues.sh - Monitoring plugin

STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3

PROJECT_DIR="/var/www/html"
TEMP_LOG=$(mktemp)

# Run BOM check
if bom --dry-run "$PROJECT_DIR" > "$TEMP_LOG" 2>&1; then
    echo "OK - No BOM issues detected"
    rm -f "$TEMP_LOG"
    exit $STATE_OK
else
    ISSUE_COUNT=$(grep -c "Would process:" "$TEMP_LOG" 2>/dev/null || echo 0)

    if [ "$ISSUE_COUNT" -gt 0 ]; then
        echo "WARNING - $ISSUE_COUNT files with BOM issues detected"
        rm -f "$TEMP_LOG"
        exit $STATE_WARNING
    else
        echo "CRITICAL - BOM checker failed to run"
        cat "$TEMP_LOG"
        rm -f "$TEMP_LOG"
        exit $STATE_CRITICAL
    fi
fi

Prometheus Metrics:

#!/bin/bash
# bom-metrics-exporter.sh - Export metrics for Prometheus

METRICS_FILE="/var/lib/prometheus/node-exporter/bom-status.prom"

# Run BOM check and capture metrics
BOM_OUTPUT=$(bom --dry-run --verbose 2>&1)
FILES_WITH_BOM=$(echo "$BOM_OUTPUT" | grep -c "Would process:" || echo 0)
TOTAL_FILES=$(echo "$BOM_OUTPUT" | grep -o "Files processed: [0-9]*" | awk '{print $3}' || echo 0)

# Export metrics
cat > "$METRICS_FILE" << EOF
# HELP bom_files_with_issues Number of files with BOM issues
# TYPE bom_files_with_issues gauge
bom_files_with_issues $FILES_WITH_BOM

# HELP bom_total_files_checked Total number of files checked
# TYPE bom_total_files_checked gauge
bom_total_files_checked $TOTAL_FILES

# HELP bom_last_check_timestamp Unix timestamp of last check
# TYPE bom_last_check_timestamp gauge
bom_last_check_timestamp $(date +%s)
EOF

9.2.2 Integration with Monitoring Systems

Grafana Dashboard Configuration:

{
  "dashboard": {
    "title": "BOM Cleanup Monitoring",
    "panels": [
      {
        "title": "Files with BOM Issues",
        "type": "stat",
        "targets": [
          {
            "expr": "bom_files_with_issues",
            "legendFormat": "Files with Issues"
          }
        ]
      },
      {
        "title": "BOM Issues Over Time",
        "type": "graph",
        "targets": [
          {
            "expr": "bom_files_with_issues",
            "legendFormat": "BOM Issues"
          }
        ]
      }
    ]
  }
}

ELK Stack Integration:

# Filebeat configuration for BOM logs
# /etc/filebeat/conf.d/bom.yml
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/bom-operations.log
  fields:
    logtype: bom-cleaner
  multiline.pattern: '^\[\d{4}-\d{2}-\d{2}'
  multiline.negate: true
  multiline.match: after

output.elasticsearch:
  hosts: ["elasticsearch:9200"]
  index: "bom-cleaner-%{+yyyy.MM.dd}"

# Logstash filter
filter {
  if [fields][logtype] == "bom-cleaner" {
    grok {
      match => { "message" => "\[%{TIMESTAMP_ISO8601:timestamp}\] \[%{WORD:level}\] %{GREEDYDATA:message_text}" }
    }
  }
}

9.3 Disaster Recovery and Backup

9.3.1 Backup Strategy

Pre-processing Backup:

#!/bin/bash
# enterprise-bom-cleanup.sh - With enterprise backup

PROJECT_DIR="$1"
BACKUP_DIR="/backups/bom-cleanup/$(date +%Y%m%d-%H%M%S)"
LOG_FILE="/var/log/bom-enterprise.log"

# Create backup directory
mkdir -p "$BACKUP_DIR"

# Function to log with timestamp
log_message() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') $*" | tee -a "$LOG_FILE"
}

# Create full backup before processing
log_message "Creating backup before BOM cleanup..."
tar -czf "$BACKUP_DIR/pre-bom-cleanup.tar.gz" "$PROJECT_DIR"

# Verify backup
if tar -tzf "$BACKUP_DIR/pre-bom-cleanup.tar.gz" > /dev/null 2>&1; then
    log_message "Backup verified successfully"
else
    log_message "ERROR: Backup verification failed"
    exit 1
fi

# Perform BOM cleanup with detailed logging
log_message "Starting BOM cleanup process..."
if bom --verbose "$PROJECT_DIR" 2>&1 | tee -a "$LOG_FILE"; then
    log_message "BOM cleanup completed successfully"

    # Create post-cleanup snapshot
    tar -czf "$BACKUP_DIR/post-bom-cleanup.tar.gz" "$PROJECT_DIR"

    # Calculate differences
    log_message "Calculating cleanup differences..."
    diff -r "$PROJECT_DIR" <(tar -xzf "$BACKUP_DIR/pre-bom-cleanup.tar.gz" -O) > "$BACKUP_DIR/changes.diff"

else
    log_message "ERROR: BOM cleanup failed, restoring from backup..."
    tar -xzf "$BACKUP_DIR/pre-bom-cleanup.tar.gz" -C "$(dirname "$PROJECT_DIR")"
    exit 1
fi

# Cleanup old backups (keep last 30 days)
find /backups/bom-cleanup -type d -mtime +30 -exec rm -rf {} \;

log_message "Enterprise BOM cleanup process completed"

9.3.2 Recovery Procedures

Automated Recovery Script:

#!/bin/bash
# bom-recovery.sh - Disaster recovery for BOM cleanup

BACKUP_DIR="/backups/bom-cleanup"
PROJECT_DIR="$1"
RECOVERY_DATE="$2"  # Format: YYYYMMDD-HHMMSS

usage() {
    echo "Usage: $0 <project_directory> [recovery_date]"
    echo "If recovery_date not specified, uses latest backup"
    exit 1
}

[ -z "$PROJECT_DIR" ] && usage

# Find backup to restore
if [ -n "$RECOVERY_DATE" ]; then
    BACKUP_FILE="$BACKUP_DIR/$RECOVERY_DATE/pre-bom-cleanup.tar.gz"
else
    BACKUP_FILE=$(find "$BACKUP_DIR" -name "pre-bom-cleanup.tar.gz" | sort | tail -1)
fi

if [ ! -f "$BACKUP_FILE" ]; then
    echo "ERROR: Backup file not found: $BACKUP_FILE"
    exit 1
fi

echo "Recovering from backup: $BACKUP_FILE"

# Verify backup integrity
if ! tar -tzf "$BACKUP_FILE" > /dev/null 2>&1; then
    echo "ERROR: Backup file is corrupted"
    exit 1
fi

# Create current state backup before recovery
EMERGENCY_BACKUP="$BACKUP_DIR/emergency-$(date +%Y%m%d-%H%M%S).tar.gz"
echo "Creating emergency backup: $EMERGENCY_BACKUP"
tar -czf "$EMERGENCY_BACKUP" "$PROJECT_DIR"

# Perform recovery
echo "Restoring from backup..."
tar -xzf "$BACKUP_FILE" -C "$(dirname "$PROJECT_DIR")"

echo "Recovery completed. Emergency backup saved to: $EMERGENCY_BACKUP"

9.4 Multi-Environment Management

9.4.1 Environment-Specific Configurations

Configuration Management:

# /etc/bom-cleaner/config.env
# Environment-specific BOM cleaner configuration

case "$ENVIRONMENT" in
    "production")
        BOM_REQUIRE_APPROVAL=true
        BOM_BACKUP_RETENTION=90  # days
        BOM_LOG_LEVEL=INFO
        BOM_NOTIFICATION_WEBHOOK="$PROD_SLACK_WEBHOOK"
        ;;
    "staging")
        BOM_REQUIRE_APPROVAL=false
        BOM_BACKUP_RETENTION=30
        BOM_LOG_LEVEL=DEBUG
        BOM_NOTIFICATION_WEBHOOK="$STAGING_SLACK_WEBHOOK"
        ;;
    "development")
        BOM_REQUIRE_APPROVAL=false
        BOM_BACKUP_RETENTION=7
        BOM_LOG_LEVEL=DEBUG
        BOM_NOTIFICATION_WEBHOOK=""
        ;;
esac

export BOM_REQUIRE_APPROVAL BOM_BACKUP_RETENTION BOM_LOG_LEVEL BOM_NOTIFICATION_WEBHOOK

9.4.2 Deployment Pipeline Integration

Ansible Playbook:

---
- name: Deploy BOM Cleaner Enterprise
  hosts: web_servers
  become: yes
  vars:
    bom_version: "v2.06.4"
    bom_install_path: "/usr/local/bin/bom"

  tasks:
    - name: Install BOM cleaner
      get_url:
        url: "https://github.com/paulmann/Clean_BOM_Senior/raw/main/clean-bom-senior.sh"
        dest: "{{ bom_install_path }}"
        mode: '0755'
        owner: root
        group: root

    - name: Create BOM cleaner config directory
      file:
        path: /etc/bom-cleaner
        state: directory
        mode: '0755'

    - name: Deploy environment-specific configuration
      template:
        src: config.env.j2
        dest: /etc/bom-cleaner/config.env
        mode: '0644'

    - name: Install cron job for regular cleanup
      cron:
        name: "Weekly BOM cleanup"
        minute: "0"
        hour: "3"
        weekday: "0"
        job: "source /etc/bom-cleaner/config.env && {{ bom_install_path }} --verbose /var/www/html"
        user: root

    - name: Create log rotation config
      template:
        src: bom-cleaner.logrotate.j2
        dest: /etc/logrotate.d/bom-cleaner
        mode: '0644'

Terraform Infrastructure:

# BOM cleaner infrastructure as code
resource "aws_instance" "bom_cleaner" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.micro"

  user_data = <<-EOF
    #!/bin/bash
    apt-get update
    curl -Lo /usr/local/bin/bom https://github.com/paulmann/Clean_BOM_Senior/raw/main/clean-bom-senior.sh
    chmod +x /usr/local/bin/bom

    # Install monitoring agent
    wget https://s3.amazonaws.com/amazoncloudwatch-agent/amazon-linux/amd64/latest/amazon-cloudwatch-agent.rpm
    rpm -U ./amazon-cloudwatch-agent.rpm
  EOF

  tags = {
    Name = "bom-cleaner-${var.environment}"
    Environment = var.environment
  }
}

# CloudWatch monitoring for BOM operations
resource "aws_cloudwatch_log_group" "bom_cleaner" {
  name              = "/aws/ec2/bom-cleaner"
  retention_in_days = 30
}

10 · Glossary

Atomic Operation[^glossary-atomic]
: A file operation that either completes entirely or fails completely, ensuring data integrity. In the context of BOM cleaning, this means files are either successfully cleaned or left unchanged, with no partial modifications.

BOM (Byte Order Mark)[^glossary-bom]
: A special Unicode character sequence that appears at the beginning of a text file to indicate byte order and encoding. In UTF-8 files, BOM appears as three bytes: EF BB BF (hexadecimal). While optional in UTF-8, BOM can cause issues in PHP and other languages.

Character Encoding[^glossary-encoding]
: The method used to represent characters as bytes in computer files. Common encodings include UTF-8, UTF-16, ASCII, and ISO-8859-1. UTF-8 is the standard for web development and modern applications.

CI/CD (Continuous Integration/Continuous Deployment)[^glossary-cicd]
: Development practices that involve frequent code integration and automated deployment pipelines. BOM cleaning is often integrated into these pipelines to ensure code quality.

CRLF (Carriage Return + Line Feed)[^glossary-crlf]
: The Windows-style line ending sequence (\r\n), contrasted with Unix-style LF-only endings (\n). Mixed line endings can cause version control and deployment issues.

DevOps[^glossary-devops]
: A set of practices that combines software development and IT operations, emphasizing automation, collaboration, and continuous improvement. BOM cleaning automation is a common DevOps practice.

Dry Run Mode[^glossary-dryrun]
: An execution mode where operations are simulated without making actual changes. Useful for previewing what would be modified before committing to changes.

File Attributes[^glossary-attributes]
: Metadata associated with files including ownership (UID/GID), permissions (mode), timestamps, and extended attributes like SELinux labels.

Git Hooks[^glossary-githooks]
: Scripts that run automatically at certain points in the Git workflow (e.g., pre-commit, pre-push). Often used to enforce coding standards including BOM checking.

Hexadecimal[^glossary-hex]
: A base-16 number system using digits 0-9 and letters A-F. Commonly used to represent byte values in files. BOM appears as «EF BB BF» in hexadecimal.

IDE (Integrated Development Environment)[^glossary-ide]
: Software applications that provide comprehensive facilities for software development, including code editors, debuggers, and build tools. Examples include VS Code, PhpStorm, and Sublime Text.

Process Substitution[^glossary-process-sub]
: A bash feature that allows the output of a command to be treated as a file. Uses syntax like < <(command). Critical for preserving variable changes in loops, unlike pipes which create subshells.

Rollback[^glossary-rollback]
: The process of reverting changes when an operation fails. BOM cleaning tools often create backups and can restore original files if errors occur.

Subshell[^glossary-subshell]
: A separate instance of the shell created when using pipes or certain other constructs. Variable changes in subshells don’t affect the parent shell, which was a key issue fixed in clean-bom-senior.sh v2.06.4.

UTF-8[^glossary-utf8]
: A variable-width character encoding for Unicode. The standard encoding for web content and modern applications. Can optionally include a BOM, though this is not recommended for most use cases.


This comprehensive guide provides enterprise-grade solutions for BOM cleaning in PHP and other web development contexts. For the latest updates and community contributions, visit the Clean BOM Senior project on GitHub.

Document Version: 2.0
Last Updated: September 28, 2025
Status: Production Ready

Sources

  1. https://www.w3.org/International/questions/qa-byte-order-mark
  2. https://alastaira.wordpress.com/2011/06/07/php-and-utf-8-bom-or-why-do-my-webpages-start-with/
  3. https://www.php.net/manual/en/function.mb-detect-encoding.php
  4. https://www.honeybadger.io/blog/php-character-encoding-unicode-utf8-ascii/
  5. https://www.php.net/manual/en/language.namespaces.definition.php
  6. https://bugs.php.net/74339
  7. https://www.php-fig.org/psr/psr-1/
  8. https://stackoverflow.com/questions/53376444/getting-fatal-error-while-i-am-trying-to-use-the-declarestrict-types-1-on-my
  9. https://php-errors.readthedocs.io/en/latest/messages/strict_types-declaration-must-be-the-very-first-statement-in-the-script.html
  10. https://laracasts.com/discuss/channels/laravel/namespace-declaration-statement-has-to-be-the-very-first-statement-or-after-any-declare-call-in-the-script-2
  11. https://stackoverflow.com/questions/5601904/encoding-a-string-as-utf-8-with-bom-in-php
  12. https://stackoverflow.com/questions/2558172/utf-8-bom-signature-in-php-files
  13. https://blog.somewhatabstract.com/2014/10/06/drop-the-bom-a-case-study-of-json-corruption-in-wordpress/
  14. https://anupamsaha.wordpress.com/2011/08/02/detecting-utf-byte-order-mark-using-php/
  15. https://community.dynamics.com/forums/thread/details/?threadid=28cd1f8f-2696-4f34-b098-e7d1d773cc2d
  16. https://www.ghisler.ch/board/viewtopic.php?t=73747
  17. https://stackoverflow.com/questions/8432584/how-can-i-make-notepad-to-save-text-in-utf-8-without-the-bom
  18. https://www.hesk.com/knowledgebase/?article=87
  19. https://forum.farmanager.com/viewtopic.php?t=12647
  20. https://stackoverflow.com/questions/10290849/how-to-remove-multiple-utf-8-bom-sequences
  21. https://csv.thephpleague.com/9.0/connections/bom/
  22. https://stackoverflow.com/questions/31983528/php-remove-bom-when-requiring-a-php-file
  23. https://github.com/emrahgunduz/bom-cleaner
  24. https://topic.alibabacloud.com/a/php-realizes-automatic-detection-and-font-classtopic-s-color00c1deremovalfont-of-bom-of-utf-8-file_1_34_33114522.html
  25. https://www.php-fig.org/psr/psr-12/
  26. https://dcblog.dev/php-strict-types-vs-weak-types-when-and-how-to-use-declarestrict-types1
  27. https://stackoverflow.com/questions/21433086/fatal-error-namespace-declaration-statement-has-to-be-the-very-first-statement
  28. https://www.php.net/manual/en/control-structures.declare.php
  29. https://github.com/kaitai-io/kaitai_struct/issues/637
  30. https://inspector.dev/why-use-declarestrict_types1-in-php-fast-tips/
  31. https://bugs.php.net/bug.php?id=48127
  32. https://bugs.php.net/bug.php?id=78043&edit=1
  33. https://cs.symfony.com/doc/rules/index.html
  34. https://github.com/PHP-CS-Fixer/PHP-CS-Fixer/issues/6872
  35. https://forum.codeigniter.com/showthread.php?tid=81750
  36. https://flowframework.readthedocs.io/en/8.3/TheDefinitiveGuide/PartV/CodingGuideLines/PHP.html
  37. https://neos.readthedocs.io/en/8.3/References/CodingGuideLines/PHP.html
  38. https://webmonkeyuk.wordpress.com/2011/04/23/how-to-avoid-character-encoding-problems-in-php/
  39. https://www.sitepoint.com/community/t/text-file-encoding-problem/39413
  40. https://www.sphinx-solution.com/blog/php-development-tools/
  41. https://www.sitepoint.com/community/t/detect-and-remove-bom/3183
  42. https://stackoverflow.com/questions/825939/php-character-encoding-problems
  43. https://www.admin-magazine.com/Articles/Automation-Scripting-with-PHP
  44. https://github.com/wp-cli/wp-cli/issues/5578
  45. https://www.reddit.com/r/PHPhelp/comments/t2ia34/using_php_to_fix_some_character_encoding_issues/
  46. https://engineering.teknasyon.com/the-what-why-how-guide-of-php-code-quality-tools-6eaa6406859
  47. https://www.php.net/manual/en/ref.mbstring.php
  48. https://groups.google.com/g/php-fig/c/fGDuNFKy390
  49. https://ashallendesign.co.uk/blog/using-declare-strict_types-1-for-more-robust-php-code
  50. https://inspector.dev/declarestricttypes1-in-laravel/
  51. https://dev.to/inspector/why-use-declarestricttypes1-in-php-fast-tips-3c1
  52. https://bdthemes.com/5-most-common-wordpress-errors-and-how-to-fix-them/

Добавить комментарий

Разработка и продвижение сайтов webseed.ru
Прокрутить вверх