Pattern Development Guide

This guide is for experienced contributors who want to create advanced patterns or improve existing ones. It covers best practices, advanced techniques, and common pitfalls to avoid.

Advanced Pattern Design

Understanding Capture Groups

Capture groups are crucial for extracting version information. The version_group field specifies which capture group contains the version string. Make sure this is correctly set and tested.

Complex Pattern Structures

For complex software that reports versions in multiple ways, consider creating multiple patterns in a single file:

{
  "vendor": "Example Vendor",
  "product": "Example Product",
  "all_versions": [
    {
      "name": "Header Pattern",
      "pattern": "Server: Example/([\\d.]+)",
      "version_group": 1,
      "priority": 150,
      "confidence": 0.9
    },
    {
      "name": "Body Pattern",
      "pattern": "Version: ([\\d.]+)",
      "version_group": 1,
      "priority": 100,
      "confidence": 0.8
    }
  ]
}

Version-Specific Patterns

For software with different version reporting mechanisms, use the versions field to organize patterns by version range:

{
  "vendor": "Example Vendor",
  "product": "Example Product",
  "versions": {
    "2.x": [
      {
        "name": "Version 2 Pattern",
        "pattern": "Example v2\\.([\\d]+)",
        "version_group": 1,
        "priority": 180,
        "confidence": 0.95
      }
    ],
    "3.x": [
      {
        "name": "Version 3 Pattern",
        "pattern": "Example v3\\.([\\d]+)",
        "version_group": 1,
        "priority": 180,
        "confidence": 0.95
      }
    ]
  }
}

Priority and Confidence Scoring

Priority
0-200
Higher = More reliable
Confidence
0.0-1.0
Accuracy level

Priority Guidelines

Confidence Guidelines

Advanced Regex Techniques

Non-Capturing Groups

Use non-capturing groups (?:...) for grouping without affecting version_group:

"pattern": "Server: (?:Apache|httpd)/([\\d.]+)"

Lookaheads and Lookbehinds

Use lookaheads (?=...) and lookbehinds (?<=...) for context-sensitive matching:

"pattern": "(?<=Server: )Apache/([\\d.]+)"

Escaping Special Characters

Remember to escape special regex characters in JSON strings:

// Correct - escaped backslashes
"pattern": "Server: Apache/([\\d.]+)"

// Incorrect - unescaped backslashes
"pattern": "Server: Apache/([\d.]+)"

Test Case Best Practices

Comprehensive Coverage

Include test cases for:

  • Positive matches with expected versions
  • Negative matches that should not match
  • Edge cases and unusual version formats
  • Different software deployment scenarios

Example Test Cases

"test_cases": [
  {
    "input": "Server: Apache/2.4.41 (Ubuntu)",
    "expected_version": "2.4.41"
  },
  {
    "input": "Server: nginx/1.18.0",
    "expected_version": "1.18.0"
  },
  {
    "input": "Server: IIS/10.0",
    "expected_version": "10.0"
  },
  {
    "input": "Server: Apache",  // No version
    "expected_version": null
  }
]

Performance Considerations

Avoid Catastrophic Backtracking

Write efficient regex patterns to avoid performance issues:

  • Use specific patterns instead of generic ones
  • Avoid nested quantifiers when possible
  • Use atomic groups (?>...) for performance-critical patterns
  • Test patterns against large datasets to ensure performance

Validation and Testing

Using the Validation Tool

Always validate your patterns with our tool:

python tools/validate-new-pattern.py patterns/by-vendor/vendor/product.json

Manual Testing

Test your patterns manually with the pattern matcher:

python tools/pattern-matcher.py "Server: Apache/2.4.41" vendor product

Common Pitfalls to Avoid

False Positives

  • Make patterns specific enough to avoid matching unrelated software
  • Test against diverse datasets to identify potential false positives
  • Use context clues to increase specificity

Version Extraction Issues

  • Ensure version_group points to the correct capture group
  • Handle different version formats (semver, date-based, etc.)
  • Consider partial version strings vs. full versions