Adding section describing marking of capital letters attribute.#28
Adding section describing marking of capital letters attribute.#28kalaspuffar wants to merge 6 commits intomasterfrom
Conversation
src/obfl-specification.html
Outdated
| implementing application, for example per job or in a system setting. A user's | ||
| preference should be respected, unless doing so would render the text difficult | ||
| to read or interpret for some reason. The hyphenate attribute can be used when | ||
| detailed control over hyphenation is required, e.g. to ensure proper rendering of |
There was a problem hiding this comment.
hyphenate attribute [...] detailed control over hyphenation
Leftovers from copy-paste.
src/obfl-specification.html
Outdated
| preference for marking capital letters should be applied.</p> | ||
|
|
||
| <p>The value of the <code>mark-capital-letters</code> attribute can either be 'true' or 'false'. | ||
| The value 'true' indicates that a text algorithm add capital markings to the |
src/obfl-specification.html
Outdated
| <p>The following table illustrates the output for different combinations of | ||
| input text and capital letters marking attribute value:</p> |
There was a problem hiding this comment.
I don't know about this table. The attribute just enables or disables indication of capitals, there's not much to understand about that. So to me, this example basically just boils down to an example of how capitals could be indicated in braille. Especially rows 3 to 6 are beside the point IMO. They just go into more detail about how this particular braille system chosen in the example works, which is irrelevant for OBFL.
|
In the core this is a good change, but I think the text has has become a bit awkward because you tried to copy the "hyphenate" section too much. Another thing is that the schema needs to be updated. But before we go further, I think we should go back one step and think about the direction we're going with this. (And I also refer again to my comments in #27, where I touch on various related issues.) We've added a new attribute that is similar to My point is that we need to be careful with recognizing "patterns" (like mark-capital-letters vs. hyphenate). A certain variable could fit into more than one concept. Also we need to think about whether we don't have too many of these concepts in the first place. We have to consider that there will surely be more variables like mark-capital-letters in the future that we will need to find a place for. Also important to bear in mind is that the attributes such as |
|
Hi @bertfrees Yes, you are right, there could be other similar parameters that we need to handle, and perhaps they could be handled differently. After working with text extraction for multiple years I don't know many more similar parameters that will impact all text. Capitals, hyphenation, abbreviations, spacing or line endings are pretty much all the text attributes you can have. Other things are styling or placement. Maybe I'm missing something here but I don't see that we will invent to many of similar attributes that can impact everything in the document. Best regards |
src/obfl-specification.html
Outdated
|
|
||
| <h3 id="L2762">mark-capital-letters</h3> | ||
|
|
||
| <p>The overall policy for marking capital letters should be determined by the |
There was a problem hiding this comment.
Can you please be more to the point? Please mention the purpose of this element clearly, that is to produce text without captials.
src/obfl-specification.html
Outdated
| <code>mark-capital-letters</code> is used on an element B to override a | ||
| specification of <code>mark-capital-letters</code> on an enclosing element A, | ||
| without specifying another marking policy. Within B, the user's | ||
| preference for marking capital letters should be applied.</p> |
There was a problem hiding this comment.
After reading this paragraph several times, I still don't get it. Can you give an example?
| <p>The value of the <code>mark-capital-letters</code> attribute can either be 'true' or 'false'. | ||
| The value 'true' indicates that we add capital markings to the | ||
| text contents. The value 'false' indicates we should | ||
| <strong>NOT</strong> add capital markings text contents.</p> |
There was a problem hiding this comment.
The default is that there is no value, meaning the document does not override the overall policy. In other words, whether or not capitals are indicated is determined by the user's
preference and/or the braille system in use. For example, for German it defaults to "false". For Dutch to "true".
There was a problem hiding this comment.
(This is what the paragraph above that you didn't understand tries to explain.)
src/obfl-specification.html
Outdated
| The value 'true' indicates that we add capital markings to the | ||
| text contents. The value 'false' indicates we should |
There was a problem hiding this comment.
The "we"-form isn't used anywhere in the specification. Sorry, I think it needs to stay a bit uniform.
Who is "we" anyway? The info in this attribute should go straight to the braille translator (this is how it's currently done in Dotify and I think it's the right way). So I'd replace "we" with "the braille translator". The issue however is that OBFL doesn't have the notion of a braille translator. At least it is not explained anywhere what this term means and how it interacts with everything (the only mentioning of it is in the "pre-translated" section. This may have been intentional because in the very first paragraph it says "[OBFL] is designed for braille, but can also be used in other fixed character width contexts".
src/obfl-specification.html
Outdated
| href="#ref-xml">XML</a>].</dd> | ||
| <dt>hyphenate [optional]</dt> | ||
| <dd>Whether or not to hyphenate text descendants. </dd> | ||
| <dd>Whether or not to |
src/obfl-specification.html
Outdated
| to read or interpret for some reason. The hyphenate attribute can be used when | ||
| detailed control over hyphenation is required, e.g. to ensure proper rendering of | ||
| content that may be difficult to interpret if capital markings are used.</p> | ||
| preference should be respected.</p> |
There was a problem hiding this comment.
Saying that there is a "overall policy" isn't really useful unless you also say the default policy can be overwritten, I think.
cd4bbb2 to
a9ae92d
Compare
|
Hi @bertfrees and @PaulRambags I'm not the one to write specifications. :) But this is a new try to get something more readable. I thought I had to follow the formula of the document. But I will make this section easier, as this function is not that complicated. First, I have a paragraph with the definition of true and false values. Then I inform on how it applies, and lastly, I specify how this attribute can be used to override the default behavior. Hope this is clearer; if not, then please give some examples :) And @PaulRambags, if you want an example on how it works in braille, one of the first commits had one, but we removed it as this is an OBFL specification and braille-specific information might not be applicable. Best regards |
|
Thanks, much better. Easier to understand too, do you agree @PaulRambags? Regarding my general remark from Friday: I was mainly just trying to draw attention on the fact that there are all these different ways of representing "parameters" and that we need to think carefully about it, now, before we add more. I'm not so sure that we're using the right approach for indication of capitals. The various ways of representing the parameters does correspond with different types of parameters:
however what these mean is not super clear and the differences are very subtle IMO. At least, they are, without a better explanation than we have now. The specification is currently very very brief about the You said:
"Abbreviations": what do you mean by this? Spacing is currently controlled through special characters in the text, and
What exactly does "styling" mean to you? What is "placement"?
What do you mean by this? |
src/obfl-specification.html
Outdated
| <p>The overall policy for marking capital letters should be determined by the | ||
| implementing application, for example per job or in a system setting.</p> |
There was a problem hiding this comment.
Would it be worth adding that the selected braille system has an influence too? See the example I gave of the German braille system, which normally never indicates capital letters at the beginning of nouns.
Or, should the mark-capital-letters attribute not even have an effect on that aspect?
There was a problem hiding this comment.
I think this is more of an override feature so perhaps examples of language-specific rules don't add to the specification?
There was a problem hiding this comment.
We don't need to give examples. But saying that it is determined by the implementing application is only partly true.
src/validation/obfl.rng
Outdated
| </optional> | ||
| <optional> | ||
| <attribute name="mark-capital-letters"> | ||
| <choice> |
There was a problem hiding this comment.
Maybe the time has come to extract the attributes translate, hyphenate and mark-capital-letters (but not xml:lang) into a separate group, to get rid of this duplication.
Contractions
Emphasize or strong, for example.
If you should have caps or not, it could be for all elements of the document. The whole document will not be emphasized. |
|
Right, emphasis. That's the most obvious example. (And it's also the only explanation the specification has.) But how does it really differ from a "text property" and from a "property of the braille translator"? What is so fundamentally different about it that it requires a different approach? Whether or not a property could also be applied to the whole document could indeed be an important factor to consider. But could it also be the only difference? If it is the deciding factor then maybe that is how we should explain it in the specification. By the way, while it doesn't make sense to emphasize a complete document, it does happen sometimes that several paragraphs are emphasized as a whole. You could argue whether this is a true form of emphasis, but at least in braille it is indicated the same way. What about the difference between "text properties" and "properties of the braille translator"? For me this is the trickiest part. For some things, it is clear. For example word-spacing and letter-spacing can hardly be considered properties of a braille system. Same for hyphenation. For mark-capital-letters on the other hand it is not that clear I think. This could actually be considered a feature of the braille system. It's about how much detail you want to convey in your braille code. Braille codes tend to be as short as possible while still being decipherable. Things that carry no or little information, like a lot of the capitalization, can be left out. A user might prefer to have them or not, based on how short he want the braille to be. So in this regard it can be considered something that is similar to a contraction grade. |
|
Another example I want to give, just to show that things are not always black and white, is that certain text properties that are not exclusively used for emphasis could nevertheless be used as emphasis. For example letter spacing could apply for the whole document, notably when the audience are children, or users that are just starting to read braille. But I can imagine it could possibly also be used to emphasize a word. At least in normal print this is a method sometimes used. |
|
Hi @bertfrees You have many good points, and there might be value in adding differing options in the future and reworking this decision. But if we do hyphenation should be moved and have the same kind of impact on the document. Maybe I could add an example with just removing capitals because I could see, for instance, an example where we create a text file, not PEF, where you remove the capitals in the output. So if we want more examples of outputs that could be created with the dotify library as the extension, then marking capitals might not be the right term. In braille, it's clear what to do but what we want to accomplish is removing the complexity of a differing character set(lowercase vs. uppercase) for readers. Not sure that this discussion adds to this PR, though. Best regards |
4001915 to
5d5ef85
Compare
What do you mean?
Are you saying we should maybe reconsider the name "mark-capital-letters"? Is "include-capital-letters"/"include-capitals"/"include-capitalisation" better? Regarding your example: maybe a bit far-fetched, but you're right, it could be a possible use, for instance for simplifying the output for people with reading difficulties (although in that case I guess it's more of a font thing than a transcription thing). But anyway, it doesn't change much about the dilemma we are faced with. The (Completely aside and not relevant for the discussion: I never liked that Joel wanted to make OBFL something more general then braille. OBFL has "Braille" in its name after all, and I have a hard time understanding what practical use it could have in other contexts than braille. But anyway...)
It absolutely does. Sorting out what the different mechanisms are and documenting them properly is not only filling an important gap in the specification, it is also required to make a deliberate decision on this new feature (and any future additions). |
|
Did you mean adding a non-braille example to the specification? |
In my mind, OBFL is a document for the layout and creation of PEF files only.
I think "include-capital-letters" would describe this feature in the most concise way.
Just as a curiosity, do you use the OBFL format to create any other output than PEF in the pipeline today?
Yes, we need to decide if we don't want to change it in the future.
Not if we don't plan on creating this output format. |
Right. Well let's say "embossed braille".
No. |
Hi @bertfrees and @PaulRambags
This pull request describes the functionality of a new attribute for text processing.
mark-capital-letterscould be used to inform the formatter if the text should have capital letter marks.Please review and comment.
Best regards