The REPLACE function appears to be nearly 20x slower with an explicit Windows collation compared to an explicit SQL collation, and nearly 40x slower than a binary collation. See the repro script. The fact that there is a difference isn't surprising, but the magnitude of it is ;c)
The query plan produced using the Windows collation is subtly different from all other cases. The Compute Scalar includes an extra CONVERT (not a CONVERT_IMPLICIT) in every case except the Windows collation.
The scale of the difference makes me wonder whether the plan produced for the Windows collation is failing to take advantage of some optimization available when one of the other collations is specified? It seems slow enough to speculate that a complete copy of the processed string is taken for every row.
While I can certainly work around this, for example by including an explicit COLLATE LATIN1_GENERAL_BIN or COLLATE LATIN1_GENERAL_BIN2, I'd be interested to know a little more about the cause of the difference - the knowledge might come in useful in other situations.