Skip to content

Adding section describing marking of capital letters attribute.#28

Open
kalaspuffar wants to merge 6 commits intomasterfrom
mark_capital_letters
Open

Adding section describing marking of capital letters attribute.#28
kalaspuffar wants to merge 6 commits intomasterfrom
mark_capital_letters

Conversation

@kalaspuffar
Copy link
Collaborator

Hi @bertfrees and @PaulRambags

This pull request describes the functionality of a new attribute for text processing. mark-capital-letters could be used to inform the formatter if the text should have capital letter marks.

Please review and comment.

Best regards

implementing application, for example per job or in a system setting. A user's
preference should be respected, unless doing so would render the text difficult
to read or interpret for some reason. The hyphenate attribute can be used when
detailed control over hyphenation is required, e.g. to ensure proper rendering of

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hyphenate attribute [...] detailed control over hyphenation

Leftovers from copy-paste.

preference for marking capital letters should be applied.</p>

<p>The value of the <code>mark-capital-letters</code> attribute can either be 'true' or 'false'.
The value 'true' indicates that a text algorithm add capital markings to the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Text algorithm?

Comment on lines 2786 to 2787
<p>The following table illustrates the output for different combinations of
input text and capital letters marking attribute value:</p>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about this table. The attribute just enables or disables indication of capitals, there's not much to understand about that. So to me, this example basically just boils down to an example of how capitals could be indicated in braille. Especially rows 3 to 6 are beside the point IMO. They just go into more detail about how this particular braille system chosen in the example works, which is irrelevant for OBFL.

@bertfrees
Copy link

In the core this is a good change, but I think the text has has become a bit awkward because you tried to copy the "hyphenate" section too much.

Another thing is that the schema needs to be updated.

But before we go further, I think we should go back one step and think about the direction we're going with this. (And I also refer again to my comments in #27, where I touch on various related issues.) We've added a new attribute that is similar to xml:lang, translate and hyphenate, and that represents a new "text property" (in Dotify terms). However there is also the style element. The line between a text property and a style is a thin one. What's more, I think we're starting to create an overlap between "text properties" and what could be considered "properties of the braille system", which is what is called the "translation mode" in Dotify, and is captured in the translate attribute in OBFL.

My point is that we need to be careful with recognizing "patterns" (like mark-capital-letters vs. hyphenate). A certain variable could fit into more than one concept. Also we need to think about whether we don't have too many of these concepts in the first place. We have to consider that there will surely be more variables like mark-capital-letters in the future that we will need to find a place for.

Also important to bear in mind is that the attributes such as translate can be specified on obfl, sequence, block, span, but currently not on elements of the page box (header, footer, field, etc.). Styles (style element, text-style attribute) on the other hand can occur on field, marker-reference, string, current-page, evaluate and anywhere within block type elements. In other words, there is quite a big asymmetry between two concepts that do not lie that far apart.

@kalaspuffar
Copy link
Collaborator Author

Hi @bertfrees

Yes, you are right, there could be other similar parameters that we need to handle, and perhaps they could be handled differently. After working with text extraction for multiple years I don't know many more similar parameters that will impact all text.

Capitals, hyphenation, abbreviations, spacing or line endings are pretty much all the text attributes you can have. Other things are styling or placement. Maybe I'm missing something here but I don't see that we will invent to many of similar attributes that can impact everything in the document.

Best regards
Daniel


<h3 id="L2762">mark-capital-letters</h3>

<p>The overall policy for marking capital letters should be determined by the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please be more to the point? Please mention the purpose of this element clearly, that is to produce text without captials.

<code>mark-capital-letters</code> is used on an element B to override a
specification of <code>mark-capital-letters</code> on an enclosing element A,
without specifying another marking policy. Within B, the user's
preference for marking capital letters should be applied.</p>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading this paragraph several times, I still don't get it. Can you give an example?

<p>The value of the <code>mark-capital-letters</code> attribute can either be 'true' or 'false'.
The value 'true' indicates that we add capital markings to the
text contents. The value 'false' indicates we should
<strong>NOT</strong> add capital markings text contents.</p>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the default?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is that there is no value, meaning the document does not override the overall policy. In other words, whether or not capitals are indicated is determined by the user's
preference and/or the braille system in use. For example, for German it defaults to "false". For Dutch to "true".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is what the paragraph above that you didn't understand tries to explain.)

Comment on lines 2778 to 2779
The value 'true' indicates that we add capital markings to the
text contents. The value 'false' indicates we should
Copy link

@bertfrees bertfrees Oct 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "we"-form isn't used anywhere in the specification. Sorry, I think it needs to stay a bit uniform.

Who is "we" anyway? The info in this attribute should go straight to the braille translator (this is how it's currently done in Dotify and I think it's the right way). So I'd replace "we" with "the braille translator". The issue however is that OBFL doesn't have the notion of a braille translator. At least it is not explained anywhere what this term means and how it interacts with everything (the only mentioning of it is in the "pre-translated" section. This may have been intentional because in the very first paragraph it says "[OBFL] is designed for braille, but can also be used in other fixed character width contexts".

href="#ref-xml">XML</a>].</dd>
<dt>hyphenate [optional]</dt>
<dd>Whether or not to hyphenate text descendants. </dd>
<dd>Whether or not to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something went wrong here?

to read or interpret for some reason. The hyphenate attribute can be used when
detailed control over hyphenation is required, e.g. to ensure proper rendering of
content that may be difficult to interpret if capital markings are used.</p>
preference should be respected.</p>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saying that there is a "overall policy" isn't really useful unless you also say the default policy can be overwritten, I think.

@kalaspuffar kalaspuffar force-pushed the mark_capital_letters branch from cd4bbb2 to a9ae92d Compare October 5, 2020 11:12
@kalaspuffar
Copy link
Collaborator Author

Hi @bertfrees and @PaulRambags

I'm not the one to write specifications. :)

But this is a new try to get something more readable. I thought I had to follow the formula of the document. But I will make this section easier, as this function is not that complicated.

First, I have a paragraph with the definition of true and false values. Then I inform on how it applies, and lastly, I specify how this attribute can be used to override the default behavior.

Hope this is clearer; if not, then please give some examples :)

And @PaulRambags, if you want an example on how it works in braille, one of the first commits had one, but we removed it as this is an OBFL specification and braille-specific information might not be applicable.

Best regards
Daniel

@bertfrees
Copy link

Thanks, much better. Easier to understand too, do you agree @PaulRambags?

Regarding my general remark from Friday: I was mainly just trying to draw attention on the fact that there are all these different ways of representing "parameters" and that we need to think carefully about it, now, before we add more. I'm not so sure that we're using the right approach for indication of capitals.

The various ways of representing the parameters does correspond with different types of parameters:

  • properties of text (text attributes)
  • properties of text processor (braille translator)
  • text "styles"

however what these mean is not super clear and the differences are very subtle IMO. At least, they are, without a better explanation than we have now. The specification is currently very very brief about the style element (and text-style attribute) and translate attribute. (The good side about it is that we still have some room to shape them according to our own interpretation.)

You said:

Capitals, hyphenation, abbreviations, spacing or line endings.

"Abbreviations": what do you mean by this?

Spacing is currently controlled through special characters in the text, and br elements. There are no parameters related to spacing yet. I think there are at least two that could be useful in OBFL: word-spacing and letter-spacing. Both of these are also properties in braille CSS. In addition I have one other CSS property called white-space.

Other things are styling or placement.

What exactly does "styling" mean to you? What is "placement"?

that will impact all text
[...]
that can impact everything in the document

What do you mean by this?

Comment on lines 2769 to 2770
<p>The overall policy for marking capital letters should be determined by the
implementing application, for example per job or in a system setting.</p>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth adding that the selected braille system has an influence too? See the example I gave of the German braille system, which normally never indicates capital letters at the beginning of nouns.

Or, should the mark-capital-letters attribute not even have an effect on that aspect?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more of an override feature so perhaps examples of language-specific rules don't add to the specification?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to give examples. But saying that it is determined by the implementing application is only partly true.

</optional>
<optional>
<attribute name="mark-capital-letters">
<choice>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the time has come to extract the attributes translate, hyphenate and mark-capital-letters (but not xml:lang) into a separate group, to get rid of this duplication.

@kalaspuffar
Copy link
Collaborator Author

kalaspuffar commented Oct 5, 2020

"Abbreviations": what do you mean by this?

Contractions

What exactly does "styling" mean to you? What is "placement"?

Emphasize or strong, for example.

Maybe I'm missing something here, but I don't see that we will invent many similar attributes that can impact everything in the document.

What do you mean by this?

If you should have caps or not, it could be for all elements of the document. The whole document will not be emphasized.

@bertfrees
Copy link

bertfrees commented Oct 5, 2020

Right, emphasis. That's the most obvious example. (And it's also the only explanation the specification has.) But how does it really differ from a "text property" and from a "property of the braille translator"? What is so fundamentally different about it that it requires a different approach?

Whether or not a property could also be applied to the whole document could indeed be an important factor to consider. But could it also be the only difference? If it is the deciding factor then maybe that is how we should explain it in the specification.

By the way, while it doesn't make sense to emphasize a complete document, it does happen sometimes that several paragraphs are emphasized as a whole. You could argue whether this is a true form of emphasis, but at least in braille it is indicated the same way.

What about the difference between "text properties" and "properties of the braille translator"? For me this is the trickiest part. For some things, it is clear. For example word-spacing and letter-spacing can hardly be considered properties of a braille system. Same for hyphenation. For mark-capital-letters on the other hand it is not that clear I think. This could actually be considered a feature of the braille system. It's about how much detail you want to convey in your braille code. Braille codes tend to be as short as possible while still being decipherable. Things that carry no or little information, like a lot of the capitalization, can be left out. A user might prefer to have them or not, based on how short he want the braille to be. So in this regard it can be considered something that is similar to a contraction grade.

@bertfrees
Copy link

Another example I want to give, just to show that things are not always black and white, is that certain text properties that are not exclusively used for emphasis could nevertheless be used as emphasis. For example letter spacing could apply for the whole document, notably when the audience are children, or users that are just starting to read braille. But I can imagine it could possibly also be used to emphasize a word. At least in normal print this is a method sometimes used.

@kalaspuffar
Copy link
Collaborator Author

Hi @bertfrees

You have many good points, and there might be value in adding differing options in the future and reworking this decision. But if we do hyphenation should be moved and have the same kind of impact on the document.

Maybe I could add an example with just removing capitals because I could see, for instance, an example where we create a text file, not PEF, where you remove the capitals in the output. So if we want more examples of outputs that could be created with the dotify library as the extension, then marking capitals might not be the right term. In braille, it's clear what to do but what we want to accomplish is removing the complexity of a differing character set(lowercase vs. uppercase) for readers.

Not sure that this discussion adds to this PR, though.

Best regards
Daniel

@kalaspuffar kalaspuffar force-pushed the mark_capital_letters branch from 4001915 to 5d5ef85 Compare October 6, 2020 07:37
@bertfrees
Copy link

bertfrees commented Oct 6, 2020

But if we do hyphenation should be moved and have the same kind of impact on the document.

What do you mean?

I could see, for instance, an example where we create a text file, not PEF, where you remove the capitals in the output ...

Are you saying we should maybe reconsider the name "mark-capital-letters"? Is "include-capital-letters"/"include-capitals"/"include-capitalisation" better?

Regarding your example: maybe a bit far-fetched, but you're right, it could be a possible use, for instance for simplifying the output for people with reading difficulties (although in that case I guess it's more of a font thing than a transcription thing).

But anyway, it doesn't change much about the dilemma we are faced with. The translate attributes applies to non-braille contexts too. It can be about any transcription/translation, not necessarily a braille transcription. And I'm still drawn towards using the translate attribute for this new feature.

(Completely aside and not relevant for the discussion: I never liked that Joel wanted to make OBFL something more general then braille. OBFL has "Braille" in its name after all, and I have a hard time understanding what practical use it could have in other contexts than braille. But anyway...)

Not sure that this discussion adds to this PR

It absolutely does. Sorting out what the different mechanisms are and documenting them properly is not only filling an important gap in the specification, it is also required to make a deliberate decision on this new feature (and any future additions).

@bertfrees
Copy link

Did you mean adding a non-braille example to the specification?

@kalaspuffar
Copy link
Collaborator Author

(Completely aside and not relevant for the discussion: I never liked that Joel wanted to make OBFL something more general then braille. OBFL has "Braille" in its name after all, and I have a hard time understanding what practical use it could have in other contexts than braille. But anyway...)

In my mind, OBFL is a document for the layout and creation of PEF files only.

Are you saying we should maybe reconsider the name "mark-capital-letters"? Is "include-capital-letters"/"include-capitals"/"include-capitalisation" better?

I think "include-capital-letters" would describe this feature in the most concise way.

But anyway, it doesn't change much about the dilemma we are faced with. The translate attributes applies to non-braille contexts too.

Just as a curiosity, do you use the OBFL format to create any other output than PEF in the pipeline today?

It absolutely does. Sorting out what the different mechanisms are and documenting them properly is not only filling an important gap in the specification, it is also required to make a deliberate decision on this new feature (and any future additions).

Yes, we need to decide if we don't want to change it in the future.

Did you mean adding a non-braille example to the specification?

Not if we don't plan on creating this output format.

@bertfrees
Copy link

bertfrees commented Oct 8, 2020

In my mind, OBFL is a document for the layout and creation of PEF files only.

Right. Well let's say "embossed braille".

Just as a curiosity, do you use the OBFL format to create any other output than PEF in the pipeline today?

No.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants