[Merge] lp:~grishkin/chm2pdf/chm2pdf_branch into lp:~reto-knaak/chm2pdf/chm2pdf_branch

Fri Oct 19 20:42:27 UTC 2012

Hello, Reto, and thanks for constructive response!

I had been watching activity in chm2pdf google groups and on Launchpad for
a while and understand that you are just an ordinary user of chm2pdf, not a
maintainer or author of software. But I see that chm2pdf was published
quite a lot of time ago and until now there were no any bugfixes or
improvements, so the project may be considered abandoned. Some time ago you
were most active in project discussion and you own a branch of it on LP.
When I found this branch, I've decided to upload my chm2pdf version on
Launchpad too, just to make it public. And my point is that our branches
should be synchronized and both should have the latest chm2pdf version with
all available fixes.

Concerning that chm file from Google Group topic. It is certainly very
dirty and someone may argue that chm2pdf should not process such files
correctly. But it was created just to demonstrate the type of files, on
which chm2pdf failed before, and now it generates pdf's with them. So I've
just put two completely random Wikipedia articles into one chm file, they
even do not link to each other, that's why some software shows one page and
other software completely another page. Anyway, the resulting pdf contains
both pages.

I've downloaded your patch for spaces in names, but it appeared that I did
not understand to which version apply it - the one from distribution, from
code.google.com or from you branch? I've tried to merge it to different
versions by hand, but the resulting files still failed on chm file from
your Demo_CMH.zip. So I've just merged your patch to my branch and fixed
the rest so that conversion started to work for me - mainly added quotes
around filenames when needed. Seems that no-table-of-conents and
spaces-in-filenames fixes perfectly work together! I did not try to solve
problem with '%20' symbols, but I'll think about it shortly and this does
not seem a difficult problem.

Reto, please also note, that since you reply through @code.launchpad.net,
you reply will be publicly available at
https://code.launchpad.net/~grishkin/chm2pdf/chm2pdf_branch/+merge/128385.
That's definitely not a problem, just pointing this out in case you have
not noticed.
-----
Best regards,
Grishkin Maxim

2012/10/16 Reto Knaak <reto.knaak at gmail.com>:
> Hi Grishkin (Max?) !
>
> Thank you for the files... I'm not a real programmer (just tried to fix
> some issues that where avoiding me to use the script) so I don't know if I
> am the right person to make the code review.
> It makes now a lot of monts I didn't boot up my virtual box with ubuntu
and
> I have the feeling I forgot most of what I learned trying to fix the
script.
>
> Anyway, this evening I had some time and began to download the files -
just
> to see what's going on.
>
> My operating system is Win7, and if I open the CHM file from windows it
> won't open, probably doe to the missinf toc!
> Then I tryied to import it to calibre, and there if I open the CHM
> something is displayed but it the "Liberty Bay" article from the online(!)
> wikipedia.
> If I convert the file to some other format (mobi), I get a page with "1951
> Chicago Bears season" which seems me the right output.
> So I'm not sure the demo chm file has a valid output, but I agree that
it's
> a good idea to try to extract what's there.
>
> I'm not familiar with the code review process, and I am asking myself if
> i/we ahould open a bug under ubuntu (that is what I did with the bugs i
> found previously)?
>
> I gave a quick glance at the diff and most are differences that are not
> really there (probably some spaces), and the only true differences are:
> in def get_html_list(cfile) and def get_objective_urls_list(filename).
>
> For the first one, it's the first time I see  "lambda" (so again, probably
> I'm not the right one to review...).
> I think I understood what it's meant for but I can't say I understand how
> it works.... (if first way to retrieve the html files fail, use the second
> one using all files found or something similar?)
>
> For the second one, "my local chm2pdf" is like this:
>
> *def get_objective_urls_list(filename):
>     '''
>     takes the list of files inside the chm archive, with the correct urls
> of each one.
>     '''
>     os.system('enum_chmLib '+filename+' >
> "'+CHM2PDF_WORK_DIR+'/urlslist.txt"')
>     flist=open(CHM2PDF_WORK_DIR+'/urlslist.txt','rU')
>     urls_list=[]
>     for line in flist.readlines()[3:]:
>         #print 'line',line
>         #This won't work if internal paths of CHM contains spaces: e.g.
> /doc space/ will only become /doc
>         #spline=line.split()
>         #urls_list.append(spline[5])
>         #this should work better:
>         spline= re.sub(r".*?normal file\s*(.*?)\n$", "\\1", line)
>         if spline[0]=="/":
>           #print "got spline="+spline
>           urls_list.append( spline)
>     flist.close()
>     # os.remove(CHM2PDF_WORK_DIR+'/urlslist.txt')
>     return urls_list
> *
> Does your solution work with chm paths containing spaces? (If you need a
> sample file see
> https://bugs.launchpad.net/ubuntu/+source/chm2pdf/+bug/894193 )
> I have the feeling (not really run any scripts this evening and forgetting
> pyton) that using  urls_list.append(spline[5]) will fail in case of paths
> with spaces!
> I have also the feeling that my solution is not really state of the art,
so
> maybe you can suggest something that solves both problems?
>
> Hope to hear you soon and Kind regards from the italian part of
Switzerland!
>
> Ciao
> Reto Knaak
>
>
> On Sun, Oct 7, 2012 at 5:23 PM, Grishkin <MGrishkin at gmail.com> wrote:
>
>> Grishkin has proposed merging lp:~grishkin/chm2pdf/chm2pdf_branch into
>> lp:~reto-knaak/chm2pdf/chm2pdf_branch.
>>
>> Requested reviews:
>>   Reto Knaak (reto-knaak)
>>
>> For more details, see:
>> https://code.launchpad.net/~grishkin/chm2pdf/chm2pdf_branch/+merge/128385
>>
>> Mainly fix from
>> https://groups.google.com/d/topic/chm2pdf/SeOGMcMFsBw/discussion.
>> --
>> https://code.launchpad.net/~grishkin/chm2pdf/chm2pdf_branch/+merge/128385
>> You are requested to review the proposed merge of
>> lp:~grishkin/chm2pdf/chm2pdf_branch into
>> lp:~reto-knaak/chm2pdf/chm2pdf_branch.
>>
>
> --
> https://code.launchpad.net/~grishkin/chm2pdf/chm2pdf_branch/+merge/128385
> You are the owner of lp:~grishkin/chm2pdf/chm2pdf_branch.

-- 
https://code.launchpad.net/~grishkin/chm2pdf/chm2pdf_branch/+merge/128385
Your team Ubuntu Sponsors Team is subscribed to branch lp:~reto-knaak/chm2pdf/chm2pdf_branch.