How to convert accented characters to unaccented in R or Power BI

Sometimes it’s necessary to convert accented characters to non accented in R or Power BI for text analysis purposes. Accented characters might be a problem in texts with spelling mistakes and spoil the analysis.

Convert accented characters in R

Luckily in R is a base function iconv that makes this task easy. You only have to adjust the target encoding.

iconv("schön, natürlich", to = 'ASCII//TRANSLIT')

iconv("tā ir dzīve", to = 'Latin1')

 

If you want to know how useful RStudio tips and tricks, take a look at this post.

Convert accented characters in Power BI (Power Query)

First of all, R should be installed on your workstation.
Open Power Query editor, go to the Transform tab and choose Run R script.

Use the iconv function to convert the dataset column with accented characters. It worked only if the current encoding parameter (from) was defined.

# 'dataset' holds the input data for this script

dataset$text <- iconv(dataset$text, from = 'UTF-8', to = 'ASCII//TRANSLIT')

output <- dataset

Here is the result.

Here is another example of how to run R script in Power Query.


Posted

in

, ,

Comments

12 responses to “How to convert accented characters to unaccented in R or Power BI”

  1. Anahí

    Just amazing! thanks

  2. Joao

    That’s incredible. It’s crazy to realize we can use M, Python and R in the same environment and we get to choose which is better for a especific situation.

  3. Zak

    Hello,

    I tried the script (my column name is “fonction”)

    # ‘dataset’ contient les données d’entrée pour ce script
    dataset$fonction <- iconv(dataset$fonction, from = 'WINDOWS-1252', to = 'ASCII//TRANSLIT')
    output <- dataset

    But I have this error :
    DataSource.Error : ADO.NET : À ρřøБļém øċćűѓřëδ щђϊℓє þŕθćęѕšϊⁿġ ŷöŭŗ Ґ šсѓϊрŧ.
    Ĥзřë åŕė ŧђз ţéćђńΐ¢âľ ðęţªΐļś: [DataFormat.Error] Désolé… Nous n'avons pas pu analyser l'entrée fournie à une valeur Date.
    Détails :
    DataSourceKind=R
    DataSourcePath=R
    Message=À ρřøБļém øċćűѓřëδ щђϊℓє þŕθćęѕšϊⁿġ ŷöŭŗ Ґ šсѓϊрŧ.
    Ĥзřë åŕė ŧђз ţéćђńΐ¢âľ ðęţªΐļś: [DataFormat.Error] Désolé… Nous n'avons pas pu analyser l'entrée fournie à une valeur Date.
    ErrorCode=-2147467259
    ExceptionType=Microsoft.PowerBI.Scripting.R.Exceptions.RUnexpectedException

    Please do you have an idea on how to solve it ?

    Thank you

    1. Janis Sturis

      Hello

      I don’t have an exact idea what might cause this problem.
      My first guess is that you should try to change the type for the column “fonction” in Power Query. For example to Text.
      Try that R script with other columns – that will ensure that technically R scripts work in your Power Query.

  4. Zak

    Hello,

    The column “fonction” is already in type text.
    With other colums I have exactly the same problem,
    I tried also in an other file (much simpler) => same problem.

    In power bi option I defined the R directory end IDE
    First I was in microsoft r open 4, I removed it and installed R 3.6.1 like you => same problem.

    When i type in r studio :
    > iconv(“schön, natürlich”, from = ‘WINDOWS-1252’, to = ‘ASCII//TRANSLIT’)
    [1] “schon, naturlich”
    It’s OK

    It’s the first time I run a R script in power bi, maybe something is “missing” ?

      1. Zak

        So I resolved my first problem, I my query result there was an incorrect date in the column date_begin => 2020-03-00
        I replace it with 2020-03-01 and now it’s OK

        Now I have 2 new problems :
        1/ In the “fonction” column
        “Télécom” become “TAClACcom” but I want “Telecom”
        2/ In the column date_begin I have “Microsoft.OleDb.Date” in place of the date on the file

        I don’t understand why the script modify also the date_begin column.

        1. Janis Sturis

          It is great that you are not giving up! That is a good recipe for success.

          1. I tried to change some of the iconv parameters and this worked for me:
          iconv(“Télécom”, to = ‘ASCII//TRANSLIT’)
          [1] “Telecom”

          2. Looks like there is a solution to the date problem: https://stackoverflow.com/questions/41823277/r-script-in-power-bi-returns-date-as-microsoft-oledb-date

          1. Zak

            Thank you very much for your help

            1/ When I try in Rstudio, both are working correctly :
            > iconv(“Télécom”, from = ‘WINDOWS-1252’, to = ‘ASCII//TRANSLIT’)
            [1] “Telecom”
            > iconv(“Télécom”, to = ‘ASCII//TRANSLIT’)
            [1] “Telecom”

            But in power BI, The 2 commands return “TAClACcom” :
            dataset$fonction<- iconv(dataset$fonction, from = 'WINDOWS-1252', to = 'ASCII//TRANSLIT')
            output <- dataset

            dataset$fonction<- iconv(dataset$fonction, to = 'ASCII//TRANSLIT')
            output <- dataset

            I think I have found a solution here :
            https://stackoverflow.com/questions/39148759/remove-accents-from-a-dataframe-column-in-r

            But I don't know how to adapt the code to R in power BI

            2/ The solution in your link solved the problem Thank you.

          2. Zak

            It’s also OK for 1/ with this code

            library(stringi)

            dataset$fonction <- stri_trans_general(dataset$fonction,"Latin-ASCII")

            output <- dataset

  5. Danka

    Hi, I installed R Open and tried to run R script above.
    I get this error:
    Formula.Firewall: Query ‘Employees’ (step ‘Run R script’) references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.

    Do you know why and what can I do about it?

    Thanks

    1. Janis Sturis

      Hi
      I think this error is related to Privacy Level settings.
      This might be helpful: https://aforanalytic.com/power-bi-query-references-other-queries-or-steps-so-it-may-not-directly-access-a-data-source/

Leave a Reply

Your email address will not be published. Required fields are marked *