Page MenuHomePhabricator

Chrome and Safari browsers wrongly insert NBSPs when content copied from View is pasted into VisualEditor
Closed, ResolvedPublic8 Estimated Story Points

Description

Some non-breaking spaces are converted to   by seemingly unrelated edits performed by WikEd (example).

Discussed here:

Summary:
It seems that when content is copied from a page in "Read" mode, and then pasted into "Edit" mode, nbsp characters are added to the clipboard, which thus sneak into the wikicode version as people copy the content into a wikicode editor. As people later on use WikEd to visit, these characters get converted to their entity representation, which makes them more noticeable to people.

Steps by Doc James:
Chrome v62, windows 10

@Doc_James wrote:
  1. Take this section of the article Septic_arthritis#Signs_and_symptoms. Hit edit using VE.
  2. Copied it here within VE editing mode User:Doc_James/HardspaceTest
  3. Copied it from User:Doc_James/HardspaceTest but did not hitting edit in VE first (just copied in reading mode).
  4. Pasted it back https://en.wikipedia.org/w/index.php?title=Septic_arthritis&type=revision&diff=817270489&oldid=816873518 (with VE: version contains UTF8 encoded nbsp: c2 a0, see attachment)
  5. They appear. https://en.wikipedia.org/w/index.php?title=Septic_arthritis&diff=next&oldid=817270489 (wikEd converts them to nbsp entities).

In this version I cannot separate "fever" and "is" so even when not visible they are there. These are NOT being added by WikEd but by VE.

This is the wikicode of the revision that introduced the nbsp, which according to the tags was made by VE.

000012c0  68 65 20 6d 6f 73 74 20  63 6f 6d 6d 6f 6e 20 6a  |he most common j|
000012d0  6f 69 6e 74 20 61 66 66  65 63 74 65 64 20 69 73  |oint affected is|
000012e0  20 74 68 65 20 6b 6e 65  65 2e c2 a0 48 69 70 2c  | the knee...Hip,|
000012f0  20 73 68 6f 75 6c 64 65  72 2c 20 77 72 69 73 74  | shoulder, wrist|
00001300  2c 20 6f 72 20 65 6c 62  6f 77 20 6a 6f 69 6e 74  |, or elbow joint|
00001310  73 20 61 72 65 20 6c 65  73 73 20 63 6f 6d 6d 6f  |s are less commo|

Unanswered questions:

  1. Does this happen when copying from the preview surface?

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Tgr updated the task description. (Show Details)
Tgr updated the task description. (Show Details)

Just to confirm what's already stated in the linked discussions, the characters being converted are non-breaking spaces (so the content of the article is not changed, the nbsp's are just made more explicit by encoding them as entities):

fetch('https://en.wikipedia.org/w/index.php?title=Parkinson%27s_disease&action=raw&oldid=791791493').then(r => r.text()).then(t => console.log(encodeURIComponent(t).match(/dopa.{1,9}pumps.{1,9}can.{1,9}be/)[0])
// dopa%20pumps%C2%A0can%20be

The first edit adding a non-breaking space to that article seems to be this one.

WikEd issues are not tracked in Phabricator (cf. T85433#3297818) so there is not much to do here. If enwiki does not have one yet, I'd suggest setting up an edit filter that warns about accidentally inserted non-breaking spaces.

Okay so an edit filter to block hidden no-breaking spaces? It could also come from a copy and paste issue involving visual editor. I am not really sure of the cause. Students often use VE work in their sandbox and than copy stuff into main space.

Why can we not simple replace hidden no-breaking spaces with normal space with WikEd?

I believe you can; it's a content decision (A. in the cases where non-breaking spaces are used intentionally, do you prefer raw ones over  ? cf. T96701; B. how would you deal with the diffs being very confusing, given it would include paragraphs you haven't touched, and spaces would be replaced by spaces, so no change visually).

In any case, Phabricator is not a good place for building community consensus for that decision, and you probably won't reach the WikEd maintainers here either, so there is not too much point in having a discussion about it here. Unless you want VisualEditor to do something about it, in which case T96666: Make non-breaking spaces (nbsp) visible in VisualEditor seems like the relevant task (or you can ask for VisualEditor to automatically convert non-breaking spaces, but as you can see from the other task some people would consider that a bug).

Aklapper renamed this task from Hard Spaces Being Added to Wikipedia to Hard Spaces added by seemingly unrelated edits performed by WikEd.Dec 24 2017, 9:30 AM

In most wikis the use of non-visible non-breaking spaces is discouraged. What I think the real issue here is the spike in non-breaking spaces being added in the first place. I think all wikied is doing is highlighting the root issue, not that wikied is the problem.

Yup agree with Betacommand. The big question is why did 450 non-visible no-breaking spaces get added in this edit? https://en.wikipedia.org/w/index.php?title=Septic_arthritis&type=revision&diff=816686853&oldid=812190379

Pine subscribed.

I appreciate WMF staff working on this during what I think is supposed to be their winter break. :) Unless I'm misunderstanding something, this ticket is about a technical problem and not a community policy issue, and the source of the technical problem is an open question, so I am changing the status of this ticket back to "open". If someone is eventually able to determine that the source of the problem is from a tool that is currently not tracked on WMF Phabricator then this ticket may be again be closed for that reason.

@Doc_James perhaps it would be good to make the title of this ticket more general to reflect that the source of the problem is currently unknown and might not be Wiki Ed. What do you think?

Thanks Pine. Yes it is not really related to WikEd. WikEd just reveals a problem created by something else. I suspect it is VE. What should it be renamed as you think?

You can type non-breaking spaces on Mac just by pressing Option+Space. You can't insert non-breaking spaces in VE, even if you wanted to (T96701: VE silently alters non-breaking spaces into normal spaces), so this is not a VE problem.

The real bug here is that WikEd converts normal non-breaking spaces to their HTML entities. This behavior has been known for years. I'm actually surprised no one has gotten WikEd removed from en.wp for "breaking" thousands of pages like this over the years.

Okay I have just verified that it is a problem with VE. VE is adding these non breaking spaces per here https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Non_breaking_spaces

The real bug here is that WikEd converts normal non-breaking spaces to their HTML entities. This behavior has been known for years. I'm actually surprised no one has gotten WikEd removed from en.wp for "breaking" thousands of pages like this over the years.

WikiEd isnt breaking pages, its actually doing exactly what it should be doing, making invisible nbsp's visible. Most wikis discourage direct usage of the raw NBSP and suggest using the HTML entity instead.

@Doc_James Please document the browser and the version of the browser you are using..

TheDJ renamed this task from Hard Spaces added by seemingly unrelated edits performed by WikEd to Read/View version of VE adds nbsp's when copying from it.Dec 27 2017, 10:01 AM
TheDJ updated the task description. (Show Details)
TheDJ updated the task description. (Show Details)

I was using Chrome version 62 when I duplicated the issue.

Based upon Doc James' description at VPT, this is another "don't copy from the read mode" problem. So:

  • Read [[Example]], copy sentence, paste into the visual editor: get invisible nbsps.
  • Click the 'Edit' button on [[Example]] (to open in the visual mode), copy the same sentence, paste into the visual editor: no nbsps.

It would be helpful to get clear steps to reproduce this problem before we start deciding where the blame lies. That's the first step.

@Doc_James could you tell us what operating system you're using? People are having trouble reproducing this.

In these two edits I was able to add 34 hard spaces. This was with windows 10 using the most recent version of google chrome.

https://en.wikipedia.org/w/index.php?title=Gout&diff=818389589&oldid=815625911

Was also able to create the same thing on a chromebook running chromium and using google chrome. Steps are described here

https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Non_breaking_spaces

Does not happen with firefox it appears. It does however appear to take the liberty of adding a capital letters to all the terms that are linked.

Lemme copy that into the ticket...

@Doc_James did you refresh between your step 2 and step 3 ? Or did you choose "Publish" and made a copy straight after the new page showed ?
And did you use multiple tabs when copying and pasting between those pages ?

TheDJ updated the task description. (Show Details)

If you want to google hangout I can walk you through exactly what I did. It can be done in two steps:

  1. Go here https://en.wikipedia.org/wiki/User:Doc_James/sandbox/VE and select the text and hit "control C"
  2. Go here and hit edit in VE mode https://en.wikipedia.org/wiki/Gout#Signs_and_symptoms and hit "control V"

Here are three screen shots

Copied text

Step1.jpg (595×895 px, 173 KB)

Pasted text in VE

Step2.jpg (595×907 px, 154 KB)

Looked at all the hard spaces

Step3.jpg (589×895 px, 221 KB)

I've spent hours investigating this now. Every time I thought I'd figured it out, and it turned out I was wrong. I think I've finally figured it out, but I've been wrong so many times in the past few hours that I now doubt everything.

When copying text, the visual editor sometimes adds non-breaking spaces to the text. This problem seems to be heavily dependent on your browser and operating system.

Here's my test case:

  1. Go to Woodrow, Hampshire and Morgan Counties, West Virginia on English Wikipedia.
  2. Whilst in read mode, copy the first sentence up to and including the full stop, but not the reference.
  3. Go to a sandbox, open the visual editor, and paste the text in.
  4. Save the page.

I did this on Mac with Chrome, Firefox, and Safari, and Windows with Chrome, Firefox, and Internet Explorer. After doing this, I inpected the HTML of the page using my browser.

Windows:

  • Chrome: 13 counts of   (non-breaking spaces) were added
  • Firefox: article was pasted correctly
  • Internet Explorer: article was pasted correctly

Mac:

  • Chrome: 12 counts of &#160;, with one <nowiki/> present in the wikitext source
  • Firefox: article was pasted correctly
  • Safari: 12 counts of &#160;, with one <nowiki/> present in the wikitext source

Here's the pages I generated during this:

I also wrote you all a song.

To the tune of Smells Like Teen Spirit by Nirvana.

[Verse 1]
I spent so much time on this
And my findings were a miss
My conclusion was stale
All I did was fail

[Pre-Chorus]
Destroy, destroy, destroy, destroy [x4]

[Chorus]
Destroy all of the browsers
Bug analysis took hours
My time has been wasted
Only pain have I tasted
I am angry and raging
Browser deaths I am staging

[Verse 2]
This bug is beyond lame
So all browsers I will maim
Chrome, Firefox, Safari, IE
They are infuriating me

[Pre-Chorus]
Destroy, destroy, destroy, destroy [x4]

[Chorus]
Destroy all of the browsers
Bug analysis took hours
My time has been wasted
Only pain have I tasted
I am angry and raging
Browser deaths I am staging

[Guitar solo]

[Verse 3]
So I spent time writing this song
And you know it didn't take long
How long it took you might ask
Less time than I spent on this task

[Pre-Chorus]
Destroy, destroy, destroy, destroy [x4]

[Chorus]
Destroy all of the browsers
Bug analysis took hours
My time has been wasted
Only pain have I tasted
I am angry and raging
Browser deaths I am staging

This is bad content editable behaviour on those browsers. The same thing happens here https://edg2s.github.io/content-editable-sandbox/ with no VE code.

The easiest fix would be to strip all nbsp on external paste, or only when it is a single nbsp surrounded by non-whitespace. This would be annoying in the rare cases it was intentional, but probably not as annoying as this bug.

Change 401751 had a related patch set uploaded (by Esanders; owner: Esanders):
[VisualEditor/VisualEditor@master] Convert single nbsp's to plain spaces on paste

https://gerrit.wikimedia.org/r/401751

Jdforrester-WMF renamed this task from Read/View version of VE adds nbsp's when copying from it to Chrome and Safari browsers wrongly insert NBSPs when content copied from View is pasted into VisualEditor.Jan 3 2018, 6:47 PM
Jdforrester-WMF assigned this task to Esanders.
Jdforrester-WMF set the point value for this task to 8.

Change 401751 merged by jenkins-bot:
[VisualEditor/VisualEditor@master] Convert single nbsp's to plain spaces on paste

https://gerrit.wikimedia.org/r/401751

Change 401787 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[mediawiki/extensions/VisualEditor@master] Update VE core submodule to master (9d7035436)

https://gerrit.wikimedia.org/r/401787

Possibly this is related to:
https://bugs.webkit.org/show_bug.cgi?id=123163 and
https://bugs.chromium.org/p/chromium/issues/detail?id=310149

Basically rebalancing of space like characters when converting from rich text formats to plain text formats causes browsers to add nbsp in certain conditions it seems?

If you google a bit, then CKEditor also has had (and maybe still has?) major problems with nbsp additions.

Change 401787 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Update VE core submodule to master (9d7035436)

https://gerrit.wikimedia.org/r/401787

Cool, thanks to everyone who helped fix this.

Question:
Would this bug have affected edits made with the 2017 wikitext editor in the same way as the visual editor? (That seems likely, since the 2017 wikitext editor is a mode within Extension:VisualEditor.)

I ask because I have encountered a case where this is a likely explanation for unwanted NBSPs, and it would be nice to be able to point to this bug as the cause.

Question:
Would this bug have affected edits made with the 2017 wikitext editor in the same way as the visual editor?

I think so, but I'm not certain.

The difficulty is that these are being inserted by bugs and inconsistencies in browsers. There is a limit to how much we can do in the visual editor to work around browser bugs. Short of completely stripping out all nonbreaking spaces, which would create as many problems as it would solve, we mostly just have to hope that the people that make the browsers fix these issues.

Have we reached out to Google and Safari about this? I heard that Google has someone working on WP issues.

Have we reached out to Google and Safari about this? I heard that Google has someone working on WP issues.

I have not, and am not aware of them working Wikimedia-related issues. Based on that, it sounds like you know much more than I do, so please feel free to reach out to them.

Have emailed Lisa who mentioned the existence of such a person at one point in time.